Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Parsing out data and adding it to an array

Posted on 2015-01-31
7
Medium Priority
?
100 Views
Last Modified: 2015-02-01
I have an html file that I need to parse for information and put that information into a number of arrays.

Can you put together an example of how I can do this?

I've got something like this:

$string = <<<EOD

<html>
<body>
<div> various text </div>
<div class="container">
<h3 class="main">title here</h3>                         //text in bold would be $main[0]
<div> various text here</div>
<div class="listing">more text here</div>            //text in bold would be $listing[0]
<div> various other text here</div>
<span class "wrap">wrap up text</span>           //text in bold would be $wrap[0]
<div>end of this listing</div>
</div>

<div class="container">
<h3 class="main">title 2 here</h3>                        //text in bold would be $main[1]
<div> various text here</div>
<div class="listing">more text here 2</div>          //text in bold would be $listing[1]
<div> various other text here</div>
<span class "wrap">wrap up text 2</span>           //text in bold would be $wrap[1]
<div>end of this listing</div>
</div>

<div class="container">
<h3 class="main">title 3 here</h3>                                  //text in bold would be $main[2]
<div> various text here</div>
<div class="listing">more text here 3</div>        //text in bold would be $listing[2]
<div> various other text here</div>
<span class "wrap">wrap up text 3</span>        //text in bold would be $wrap[2]
<div>end of this listing</div>
</div>

<div> various text</div>
</body>
</html>
EOD;


So for each <div class="container">  I need to pull out the text between:

<h3 class="main">          </h3>

<div class="listing">      </div>

<span class "wrap">      </span>

and make each into it's own array, so I can later say:

<p>
echo $main[0];'<br />';
echo $listing[0];'<br />';
echo $wrap[0];'<br />';
</p><p>
echo $main[1];'<br />';
echo $listing[1];'<br />';
echo $wrap[1];'<br />';
</p><p>
echo $main[1];'<br />';
echo $listing[1];'<br />';
echo $wrap[1];'<br />';
</p>

There will be 25 reiterations of this in all.

I know some php basics of course, but don't know how to put a while or for each thing nor how to pull the data out from between text... but I can certainly work off of a good example if you can put one together for me.

Thanks!   Chris
0
Comment
Question by:St_Aug_Beach_Bum
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
7 Comments
 
LVL 62

Expert Comment

by:gheist
ID: 40582378
So you want to extract data from a webpage?
0
 
LVL 25

Expert Comment

by:Kyle Hamilton
ID: 40582398
sounds like you're looking for scraper. if your php knowledge is that weak, you might be better off finding a library that does it.
0
 
LVL 25

Expert Comment

by:Kyle Hamilton
ID: 40582413
[link to competing web site removed]
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 111

Expert Comment

by:Ray Paseur
ID: 40582481
Is this really the data you're working with?  It's not even valid HTML!
0
 
LVL 111

Accepted Solution

by:
Ray Paseur earned 2000 total points
ID: 40582520
This does what you're asking for, but I have a feeling that there is a "backstory" here, and if we understood that we might be able to lead you to a more suitable solution.
http://iconoun.com/demo/temp_staug.php 

Web scraping is fraught with risk and you should expect any web scraping script to fail at any time without notice, so don't depend on this automation to do anything important, or to produce any output that goes directly into another automated process.  The reason this is risky is that publishers can, and do, tinker with their HTML documents all the time.  Any and all assumptions about the structure of an HTML document are at risk.

A safer and better approach to getting external data is to ask the publisher to expose an API.  APIs are typically version-controlled and stable.  API version 1.0 will always produce the same document (probably XML or JSON) and will not vary, so you can depend on the format.  API version 1.1 will represent improvements and additions to version 1.0.  Things won't really change until you get to API version 2+, etc.  

If the publisher wants you to be able to use its data, it should expose an API for you, however this may come at a cost, since the publisher is the copyright holder and can legally charge for the use of its data.

<?php

/**
 * See http://www.experts-exchange.com/Programming/Languages/Scripting/PHP/Q_28607748.html
 */
error_reporting(E_ALL);

// TEST DATA CREATED USING NOWDOC SYNTAX
$str = <<<'EOD'

<html>
<body>
<div> various text </div>
<div class="container">
<h3 class="main">title here</h3>                         //text in bold would be $main[0]
<div> various text here</div>
<div class="listing">more text here</div>            //text in bold would be $listing[0]
<div> various other text here</div>
<span class "wrap">wrap up text</span>           //text in bold would be $wrap[0]
<div>end of this listing</div>
</div>

<div class="container">
<h3 class="main">title 2 here</h3>                        //text in bold would be $main[1]
<div> various text here</div>
<div class="listing">more text here 2</div>          //text in bold would be $listing[1]
<div> various other text here</div>
<span class "wrap">wrap up text 2</span>           //text in bold would be $wrap[1]
<div>end of this listing</div>
</div>

<div class="container">
<h3 class="main">title 3 here</h3>                                  //text in bold would be $main[2]
<div> various text here</div>
<div class="listing">more text here 3</div>        //text in bold would be $listing[2]
<div> various other text here</div>
<span class "wrap">wrap up text 3</span>        //text in bold would be $wrap[2]
<div>end of this listing</div>
</div>

<div> various text</div>
</body>
</html>
EOD;


// THE SIGNAL INFORMATION
$trap['main']    = [ '<h3 class="main">',      '</h3>' ];
$trap['listing'] = [ '<div class="listing">',  '</div>' ];
$trap['wrap']    = [ '<span class "wrap">',    '</span>' ];


// EXTRACT THE DATA AND BUILD NEW ARRAYS
foreach ($trap as $var => $arr)
{
    $rgx = '#' . '(' . preg_quote($arr[0]) . ')(.*?)(' . preg_quote($arr[1]) . ')#';
    preg_match_all($rgx, $str, $mat);
    $$var = $mat[2];
}


// SHOW THE WORK PRODUCTS
$kount = 0;
while ($kount > -1)
{
    echo '<p>';
    echo $main[$kount]    . '<br />';
    echo $listing[$kount] . '<br />';
    echo $wrap[$kount]    . '<br />';
    echo '</p>' . PHP_EOL;

    $kount++;
    if (empty($main[$kount])) break;
}

Open in new window

0
 

Author Closing Comment

by:St_Aug_Beach_Bum
ID: 40582748
Yikes, ok all.

Thank you for help. It's not the actually html, I just threw that together for an example so I can work from it. I'm pulling data from a number of sites for a project on trends, not a critical application. Looked at several spidering services by they didn't do quite what I wanted and were costly for a small project.

Thanks again.
0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 40582770
OK, good luck with it.  As long as you understand the risks...

best regards, ~Ray
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I imagine that there are some, like me, who require a way of getting currency exchange rates for implementation in web project from time to time, so I thought I would share a solution that I have developed for this purpose. It turns out that Yaho…
These days socially coordinated efforts have turned into a critical requirement for enterprises.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

704 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question