Solved

Parsing out data and adding it to an array

Posted on 2015-01-31
7
99 Views
Last Modified: 2015-02-01
I have an html file that I need to parse for information and put that information into a number of arrays.

Can you put together an example of how I can do this?

I've got something like this:

$string = <<<EOD

<html>
<body>
<div> various text </div>
<div class="container">
<h3 class="main">title here</h3>                         //text in bold would be $main[0]
<div> various text here</div>
<div class="listing">more text here</div>            //text in bold would be $listing[0]
<div> various other text here</div>
<span class "wrap">wrap up text</span>           //text in bold would be $wrap[0]
<div>end of this listing</div>
</div>

<div class="container">
<h3 class="main">title 2 here</h3>                        //text in bold would be $main[1]
<div> various text here</div>
<div class="listing">more text here 2</div>          //text in bold would be $listing[1]
<div> various other text here</div>
<span class "wrap">wrap up text 2</span>           //text in bold would be $wrap[1]
<div>end of this listing</div>
</div>

<div class="container">
<h3 class="main">title 3 here</h3>                                  //text in bold would be $main[2]
<div> various text here</div>
<div class="listing">more text here 3</div>        //text in bold would be $listing[2]
<div> various other text here</div>
<span class "wrap">wrap up text 3</span>        //text in bold would be $wrap[2]
<div>end of this listing</div>
</div>

<div> various text</div>
</body>
</html>
EOD;


So for each <div class="container">  I need to pull out the text between:

<h3 class="main">          </h3>

<div class="listing">      </div>

<span class "wrap">      </span>

and make each into it's own array, so I can later say:

<p>
echo $main[0];'<br />';
echo $listing[0];'<br />';
echo $wrap[0];'<br />';
</p><p>
echo $main[1];'<br />';
echo $listing[1];'<br />';
echo $wrap[1];'<br />';
</p><p>
echo $main[1];'<br />';
echo $listing[1];'<br />';
echo $wrap[1];'<br />';
</p>

There will be 25 reiterations of this in all.

I know some php basics of course, but don't know how to put a while or for each thing nor how to pull the data out from between text... but I can certainly work off of a good example if you can put one together for me.

Thanks!   Chris
0
Comment
Question by:St_Aug_Beach_Bum
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
7 Comments
 
LVL 62

Expert Comment

by:gheist
ID: 40582378
So you want to extract data from a webpage?
0
 
LVL 25

Expert Comment

by:Kyle Hamilton
ID: 40582398
sounds like you're looking for scraper. if your php knowledge is that weak, you might be better off finding a library that does it.
0
 
LVL 25

Expert Comment

by:Kyle Hamilton
ID: 40582413
[link to competing web site removed]
0
Are You Using the Best Web Development Editor?

The worlds of web hosting and web development are constantly evolving. Every year we see design trends change, coding standards adapt and new frameworks/CMS created. With such a quick pace of change it’s easy to get lost trying to keep up.

See if your editor made the list.

 
LVL 110

Expert Comment

by:Ray Paseur
ID: 40582481
Is this really the data you're working with?  It's not even valid HTML!
0
 
LVL 110

Accepted Solution

by:
Ray Paseur earned 500 total points
ID: 40582520
This does what you're asking for, but I have a feeling that there is a "backstory" here, and if we understood that we might be able to lead you to a more suitable solution.
http://iconoun.com/demo/temp_staug.php 

Web scraping is fraught with risk and you should expect any web scraping script to fail at any time without notice, so don't depend on this automation to do anything important, or to produce any output that goes directly into another automated process.  The reason this is risky is that publishers can, and do, tinker with their HTML documents all the time.  Any and all assumptions about the structure of an HTML document are at risk.

A safer and better approach to getting external data is to ask the publisher to expose an API.  APIs are typically version-controlled and stable.  API version 1.0 will always produce the same document (probably XML or JSON) and will not vary, so you can depend on the format.  API version 1.1 will represent improvements and additions to version 1.0.  Things won't really change until you get to API version 2+, etc.  

If the publisher wants you to be able to use its data, it should expose an API for you, however this may come at a cost, since the publisher is the copyright holder and can legally charge for the use of its data.

<?php

/**
 * See http://www.experts-exchange.com/Programming/Languages/Scripting/PHP/Q_28607748.html
 */
error_reporting(E_ALL);

// TEST DATA CREATED USING NOWDOC SYNTAX
$str = <<<'EOD'

<html>
<body>
<div> various text </div>
<div class="container">
<h3 class="main">title here</h3>                         //text in bold would be $main[0]
<div> various text here</div>
<div class="listing">more text here</div>            //text in bold would be $listing[0]
<div> various other text here</div>
<span class "wrap">wrap up text</span>           //text in bold would be $wrap[0]
<div>end of this listing</div>
</div>

<div class="container">
<h3 class="main">title 2 here</h3>                        //text in bold would be $main[1]
<div> various text here</div>
<div class="listing">more text here 2</div>          //text in bold would be $listing[1]
<div> various other text here</div>
<span class "wrap">wrap up text 2</span>           //text in bold would be $wrap[1]
<div>end of this listing</div>
</div>

<div class="container">
<h3 class="main">title 3 here</h3>                                  //text in bold would be $main[2]
<div> various text here</div>
<div class="listing">more text here 3</div>        //text in bold would be $listing[2]
<div> various other text here</div>
<span class "wrap">wrap up text 3</span>        //text in bold would be $wrap[2]
<div>end of this listing</div>
</div>

<div> various text</div>
</body>
</html>
EOD;


// THE SIGNAL INFORMATION
$trap['main']    = [ '<h3 class="main">',      '</h3>' ];
$trap['listing'] = [ '<div class="listing">',  '</div>' ];
$trap['wrap']    = [ '<span class "wrap">',    '</span>' ];


// EXTRACT THE DATA AND BUILD NEW ARRAYS
foreach ($trap as $var => $arr)
{
    $rgx = '#' . '(' . preg_quote($arr[0]) . ')(.*?)(' . preg_quote($arr[1]) . ')#';
    preg_match_all($rgx, $str, $mat);
    $$var = $mat[2];
}


// SHOW THE WORK PRODUCTS
$kount = 0;
while ($kount > -1)
{
    echo '<p>';
    echo $main[$kount]    . '<br />';
    echo $listing[$kount] . '<br />';
    echo $wrap[$kount]    . '<br />';
    echo '</p>' . PHP_EOL;

    $kount++;
    if (empty($main[$kount])) break;
}

Open in new window

0
 

Author Closing Comment

by:St_Aug_Beach_Bum
ID: 40582748
Yikes, ok all.

Thank you for help. It's not the actually html, I just threw that together for an example so I can work from it. I'm pulling data from a number of sites for a project on trends, not a critical application. Looked at several spidering services by they didn't do quite what I wanted and were costly for a small project.

Thanks again.
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 40582770
OK, good luck with it.  As long as you understand the risks...

best regards, ~Ray
0

Featured Post

[Webinar] How Hackers Steal Your Credentials

Do You Know How Hackers Steal Your Credentials? Join us and Skyport Systems to learn how hackers steal your credentials and why Active Directory must be secure to stop them. Thursday, July 13, 2017 10:00 A.M. PDT

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I imagine that there are some, like me, who require a way of getting currency exchange rates for implementation in web project from time to time, so I thought I would share a solution that I have developed for this purpose. It turns out that Yaho…
There are times when I have encountered the need to decompress a response from a PHP request. This is how it's done, but you must have control of the request and you can set the Accept-Encoding header.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

632 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question