Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

Best Web scrapping software?

Posted on 2013-01-15
6
Medium Priority
?
546 Views
Last Modified: 2013-01-18
What is the best Web Scrapping software?  I would like to scrap the details  available (in tabular form) on the links (around 40 links) of the following website.

http://www.amfiindia.com/amfimembers.aspx

Could you please help me understand the best software?

Thanks,
--Anand
0
Comment
Question by:FTbridge
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
6 Comments
 
LVL 27

Expert Comment

by:Lukasz Chmielewski
ID: 38781893
There is no best, it depend on what you want. If you're into programming, you can write you own scrapper, if not - you can take a look at this links:

http://www.poynter.org/how-tos/digital-strategies/e-media-tidbits/102589/how-to-scrape-websites-for-data-without-programming-skills/

http://blog.outwit.com/?p=55
0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 38782236
The information in that web site is © Association of Mutual Funds in India (AMFI) - Copyright 2013.  You need to get their permission to copy, store or repurpose the content.  But that aside, if you can post the exact URLs of one or two of the pages you want to extract and post an example of the data you're trying to get I will be able to show you how to use a PHP script to get this information.
0
 

Author Comment

by:FTbridge
ID: 38782483
Thank you Ray_Paseur and Roads_Roads.

Ray_Paseur,
Here is what I want to do: On the following link there are links for several mutual funds on this page.
http://www.amfiindia.com/amfimembers.aspx

If go to the above URL you will see following fund as the first link:
-  BOI AXA Investment Managers Private Limited
If you click on this text you will see following page
http://www.amfiindia.com/amfiMembers.aspx?mfid=46

I would like to download name of the MF, Address and phone number for all the funds listed on the first page.

I am not going to sell or repurpose the contents. I just want to sort these mutual funds by the address. Thank you for your help in advance.
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 111

Accepted Solution

by:
Ray Paseur earned 2000 total points
ID: 38790192
This takes a while to run, so please be patient.  
http://www.laprbass.com/RAY_temp_ftbridge.php

You may want to normalize the address a little bit before you sort (but there are only 40+ data elements so it should be straightforward).

<?php // RAY_temp_ftbridge.php
error_reporting(E_ALL);
echo '<pre>';

// THE URLS AND GETTING THE HTML DOCUMENT
$bas = 'http://www.amfiindia.com/';
$url = $bas . 'amfimembers.aspx';
$htm = file_get_contents($url);

// A SIGNAL STRING TO DECLOP THE HTML
$sig = '<a href="amfiMembers.aspx?mfid=';

// BREAK AND MANIPULATE
$arr = explode($sig, $htm);
unset($arr[0]);
foreach ($arr as $key => $str)
{
    $new = $sig . $str;
    $sub = explode('"', $new);
    $arr[$key] = trim($bas . $sub[1]);
}

// ACTIVATE THIS TO SEE THE LIST OF URLS
// print_r($arr);

// ITERATE OVER THE LINKS
$objs = array();
foreach ($arr as $lnk)
{
    // CREATE AN OBJECT TO HOLD THIS DATA AND READ THE PAGE
    $obj = new StdClass;
    $htm = file_get_contents($lnk);

    // ISOLATE THE NAME
    $sub = explode('Name of the Mutual Fund</td>', $htm);
    $sub = explode('</td>', $sub[1]);
    $sub = strip_tags($sub[0]);
    $obj->nam = $sub;

    // ISOLATE THE ADDRESS
    $sub = explode('Address of AMC</td>', $htm);
    $sub = explode('</td>', $sub[1]);
    $sub = strip_tags($sub[0]);
    $obj->add = $sub;

    // ISOLATE THE PHONE
    $sub = explode('Telephone Number</td>', $htm);
    $sub = explode('</td>', $sub[1]);
    $sub = strip_tags($sub[0]);
    $obj->fon = $sub;

    // ADD THIS OBJECT TO OUR ARRAY
    $objs[] = $obj;
}

// SHOW THE ACQUIRED DATA IN THE ARRAY OF OBJECTS
print_r($objs);

Open in new window

HTH, ~Ray
0
 

Author Closing Comment

by:FTbridge
ID: 38792070
Ray,

This is excellent! Thank you very much for your help on this! You have saved my time!

Best regards,
--Anand
0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 38793395
Thanks for the points and thanks for using EE, ~Ray
0

Featured Post

[Webinar] Lessons on Recovering from Petya

Skyport is working hard to help customers recover from recent attacks, like the Petya worm. This work has brought to light some important lessons. New malware attacks like this can take down your entire environment. Learn from others mistakes on how to prevent Petya like worms.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article discusses how to create an extensible mechanism for linked drop downs.
Introduction This article is intended for those who are new to PHP error handling (https://www.experts-exchange.com/articles/11769/And-by-the-way-I-am-New-to-PHP.html).  It addresses one of the most common problems that plague beginning PHP develop…
This Micro Tutorial will demonstrate how to add subdomains to your content reports. This can be very importing in having a site with multiple subdomains.
Want to learn how to record your desktop screen without having to use an outside camera. Click on this video and learn how to use the cool google extension called "Screencastify"! Step 1: Open a new google tab Step 2: Go to the left hand upper corn…
Suggested Courses

618 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question