Solved

Parsing out data and adding it to an array

Posted on 2015-01-31
7
85 Views
Last Modified: 2015-02-01
I have an html file that I need to parse for information and put that information into a number of arrays.

Can you put together an example of how I can do this?

I've got something like this:

$string = <<<EOD

<html>
<body>
<div> various text </div>
<div class="container">
<h3 class="main">title here</h3>                         //text in bold would be $main[0]
<div> various text here</div>
<div class="listing">more text here</div>            //text in bold would be $listing[0]
<div> various other text here</div>
<span class "wrap">wrap up text</span>           //text in bold would be $wrap[0]
<div>end of this listing</div>
</div>

<div class="container">
<h3 class="main">title 2 here</h3>                        //text in bold would be $main[1]
<div> various text here</div>
<div class="listing">more text here 2</div>          //text in bold would be $listing[1]
<div> various other text here</div>
<span class "wrap">wrap up text 2</span>           //text in bold would be $wrap[1]
<div>end of this listing</div>
</div>

<div class="container">
<h3 class="main">title 3 here</h3>                                  //text in bold would be $main[2]
<div> various text here</div>
<div class="listing">more text here 3</div>        //text in bold would be $listing[2]
<div> various other text here</div>
<span class "wrap">wrap up text 3</span>        //text in bold would be $wrap[2]
<div>end of this listing</div>
</div>

<div> various text</div>
</body>
</html>
EOD;


So for each <div class="container">  I need to pull out the text between:

<h3 class="main">          </h3>

<div class="listing">      </div>

<span class "wrap">      </span>

and make each into it's own array, so I can later say:

<p>
echo $main[0];'<br />';
echo $listing[0];'<br />';
echo $wrap[0];'<br />';
</p><p>
echo $main[1];'<br />';
echo $listing[1];'<br />';
echo $wrap[1];'<br />';
</p><p>
echo $main[1];'<br />';
echo $listing[1];'<br />';
echo $wrap[1];'<br />';
</p>

There will be 25 reiterations of this in all.

I know some php basics of course, but don't know how to put a while or for each thing nor how to pull the data out from between text... but I can certainly work off of a good example if you can put one together for me.

Thanks!   Chris
0
Comment
Question by:St_Aug_Beach_Bum
7 Comments
 
LVL 61

Expert Comment

by:gheist
ID: 40582378
So you want to extract data from a webpage?
0
 
LVL 25

Expert Comment

by:Kyle Hamilton
ID: 40582398
sounds like you're looking for scraper. if your php knowledge is that weak, you might be better off finding a library that does it.
0
 
LVL 25

Expert Comment

by:Kyle Hamilton
ID: 40582413
[link to competing web site removed]
0
Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

 
LVL 108

Expert Comment

by:Ray Paseur
ID: 40582481
Is this really the data you're working with?  It's not even valid HTML!
0
 
LVL 108

Accepted Solution

by:
Ray Paseur earned 500 total points
ID: 40582520
This does what you're asking for, but I have a feeling that there is a "backstory" here, and if we understood that we might be able to lead you to a more suitable solution.
http://iconoun.com/demo/temp_staug.php

Web scraping is fraught with risk and you should expect any web scraping script to fail at any time without notice, so don't depend on this automation to do anything important, or to produce any output that goes directly into another automated process.  The reason this is risky is that publishers can, and do, tinker with their HTML documents all the time.  Any and all assumptions about the structure of an HTML document are at risk.

A safer and better approach to getting external data is to ask the publisher to expose an API.  APIs are typically version-controlled and stable.  API version 1.0 will always produce the same document (probably XML or JSON) and will not vary, so you can depend on the format.  API version 1.1 will represent improvements and additions to version 1.0.  Things won't really change until you get to API version 2+, etc.  

If the publisher wants you to be able to use its data, it should expose an API for you, however this may come at a cost, since the publisher is the copyright holder and can legally charge for the use of its data.

<?php

/**
 * See http://www.experts-exchange.com/Programming/Languages/Scripting/PHP/Q_28607748.html
 */
error_reporting(E_ALL);

// TEST DATA CREATED USING NOWDOC SYNTAX
$str = <<<'EOD'

<html>
<body>
<div> various text </div>
<div class="container">
<h3 class="main">title here</h3>                         //text in bold would be $main[0]
<div> various text here</div>
<div class="listing">more text here</div>            //text in bold would be $listing[0]
<div> various other text here</div>
<span class "wrap">wrap up text</span>           //text in bold would be $wrap[0]
<div>end of this listing</div>
</div>

<div class="container">
<h3 class="main">title 2 here</h3>                        //text in bold would be $main[1]
<div> various text here</div>
<div class="listing">more text here 2</div>          //text in bold would be $listing[1]
<div> various other text here</div>
<span class "wrap">wrap up text 2</span>           //text in bold would be $wrap[1]
<div>end of this listing</div>
</div>

<div class="container">
<h3 class="main">title 3 here</h3>                                  //text in bold would be $main[2]
<div> various text here</div>
<div class="listing">more text here 3</div>        //text in bold would be $listing[2]
<div> various other text here</div>
<span class "wrap">wrap up text 3</span>        //text in bold would be $wrap[2]
<div>end of this listing</div>
</div>

<div> various text</div>
</body>
</html>
EOD;


// THE SIGNAL INFORMATION
$trap['main']    = [ '<h3 class="main">',      '</h3>' ];
$trap['listing'] = [ '<div class="listing">',  '</div>' ];
$trap['wrap']    = [ '<span class "wrap">',    '</span>' ];


// EXTRACT THE DATA AND BUILD NEW ARRAYS
foreach ($trap as $var => $arr)
{
    $rgx = '#' . '(' . preg_quote($arr[0]) . ')(.*?)(' . preg_quote($arr[1]) . ')#';
    preg_match_all($rgx, $str, $mat);
    $$var = $mat[2];
}


// SHOW THE WORK PRODUCTS
$kount = 0;
while ($kount > -1)
{
    echo '<p>';
    echo $main[$kount]    . '<br />';
    echo $listing[$kount] . '<br />';
    echo $wrap[$kount]    . '<br />';
    echo '</p>' . PHP_EOL;

    $kount++;
    if (empty($main[$kount])) break;
}

Open in new window

0
 

Author Closing Comment

by:St_Aug_Beach_Bum
ID: 40582748
Yikes, ok all.

Thank you for help. It's not the actually html, I just threw that together for an example so I can work from it. I'm pulling data from a number of sites for a project on trends, not a critical application. Looked at several spidering services by they didn't do quite what I wanted and were costly for a small project.

Thanks again.
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 40582770
OK, good luck with it.  As long as you understand the risks...

best regards, ~Ray
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Suggested Solutions

Generating table dynamically is the most common issue faced by php developers.... So it seems there is a need of an article that explains the basic concept of generating tables dynamically. It just requires a basic knowledge of html and little maths…
Developers of all skill levels should learn to use current best practices when developing websites. However many developers, new and old, fall into the trap of using deprecated features because this is what so many tutorials and books tell them to u…
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now