Avatar of Mehran Goudarzi
Mehran Goudarzi
 asked on

curl parse data from site

my URL is :
https://en.wikipedia.org/wiki/List_of_Iranian_cities_by_population

Open in new window


i want parse every thing under City and  it save to the Text File .
* CurlPHP* Parsing

Avatar of undefined
Last Comment
Mehran Goudarzi

8/22/2022 - Mon
Jeff Darling

Here is a list of the cities saved from that table.

<?php

$dom = new DOMDocument;


$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://en.wikipedia.org/wiki/List_of_Iranian_cities_by_population");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$html = curl_exec($ch);
curl_close($ch);

$dom->loadHTML($html );

$tableNumber = 0;



foreach ($dom->getElementsByTagName('table') as $table) {
    if (!$table->hasAttribute('class')) {
        continue;
    }
    
    $tableNumber = $tableNumber + 1;
       
    if ($tableNumber == 1) {
       
        $myfile = fopen("newfile.txt", "w") or die("Unable to open file!");

        foreach ($table->getElementsByTagName('tr') as $tr) {
            $tds = $tr->getElementsByTagName('td'); 
            fwrite($myfile, $tds->item(1)->nodeValue."\n");
        }

        fclose($myfile);
        
    }



}

echo "Done";


?>

Open in new window

Mehran Goudarzi

ASKER
Thanks but my address is blank
http://localhost/city.php

Open in new window

Jeff Darling

can you explain what address is blank?

This is the list I get when running that code.

Tehran
Mashhad
Isfahan
Karaj
Tabriz
Shiraz
Ahvaz
Qom
Kermanshah
Orumieh
Rasht
Zahedan
Kerman
Arak
Hamedan
Yazd
Ardabil
Bandar Abbas
Eslamshahr
Qazvin
Zanjan
Khorramabad
Sanandaj
Malard
Shahr-e Qods
Kashan
Gorgan
Golestan
Sari
Shahriar
Dezful
Khomeinishahr
Borujerd
Nishapur
Sabzevar
Najafabad
Amol
Babol
Varamin
Abadan
Pakdasht
Khoy
Saveh
Bojnourd
Qa'em Shahr
Bushehr
Gharchak
Sirjan
Birjand
Ilam
Bukan
Maragheh
Malayer
Shahrekord
Nasimshahr
Mahshahr
Semnan
Rafsanjan
Mahabad
Gonbad-e Qabus
Shahinshahr
Shahrood
Saqqez
Marvdasht
Zabol
Torbat-e Heydarieh
Khorramshahr
Andimeshk
Marand
Shahreza
Miandoab
Izeh
Bandar-e Anzali
Jahrom
Jiroft
Marivan
Kamal Shahr
Yasuj
Nazarabad
Behbahan
Bam
Shush
Fasa
Quchan
Masjed Soleyman
Mohammadshahr
Dorud

Open in new window

Your help has saved me hundreds of hours of internet surfing.
fblack61
Mehran Goudarzi

ASKER
i copy and paste your code to my city.php on my apache server , now when i try it on my url its blank .
Jeff Darling

Does the file exist?

<?php

$filename = 'newfile.txt';

if (file_exists($filename)) {
    echo 'The file '.$filename.' exists.</br>';
} else {
    echo 'The file '.$filename.' does not exist.</br>';
}
?>

Open in new window

Mehran Goudarzi

ASKER
/var/www/html# cat city.php 
<?php

$matches = array();

$dom = new DOMDocument;


$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://en.wikipedia.org/wiki/List_of_Iranian_cities_by_population");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$html = curl_exec($ch);
curl_close($ch);

$dom->loadHTML($html );

$tableNumber = 0;



foreach ($dom->getElementsByTagName('table') as $table) {
    if (!$table->hasAttribute('class')) {
        continue;
    }
    
    $tableNumber = $tableNumber + 1;
    
    $class = explode(' ', $table->getAttribute('class'));
    
    
    if (in_array('wikitable', $class)) {
        $matches[] = $table->getElementsByTagName('tr');
    }
    
    if ($tableNumber == 1) {
       
        $myfile = fopen("newfile.txt", "w") or die("Unable to open file!");

        foreach ($table->getElementsByTagName('tr') as $tr) {
            $tds = $tr->getElementsByTagName('td'); 
            fwrite($myfile, $tds->item(1)->nodeValue."\n");
        }

        fclose($myfile);
        
    }



}

echo "Done";


?>
root@kali:/var/www/html# 

Open in new window



and file dosent exit
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Jeff Darling

There must be some errors happening.  Maybe curl isn't enabled?

<?php

phpinfo();

?>

Open in new window


look for a section like this:

curl
Ray Paseur

If you don't have cURL, you may be able to read the document with file_get_contents()
Mehran Goudarzi

ASKER
Yes , Same as you but my version different .
Screen-Shot-2017-04-11-at-10.59.50-P.png
This is the best money I have ever spent. I cannot not tell you how many times these folks have saved my bacon. I learn so much from the contributors.
rwheeler23
Colin_UK

Hi,

It may be a permissions problem.
The web server will need write permission to the directory that city.php is served from and by default RHEL does not allow it.
You may need to add Write permission to the directory.

Hope this helps
Colin
Mehran Goudarzi

ASKER
i added 777 on it

/var/www/html# ls -l
total 36
-rw-r--r-- 1 root root    22 Apr 11 22:57 check.php
-rwxrwxrwx 1 root root  1019 Apr 11 22:15 city.php
-rw-r--r-- 1 root root    75 Mar 30 15:41 composer.json
-rw-r--r-- 1 root root 11971 Mar 30 15:41 composer.lock
drwxrwxrwx 3 root root  4096 Mar 30 16:32 inst
drwxr-xr-x 3 root root  4096 Apr 11 13:52 login
drwxrwxrwx 8 root root  4096 Mar 30 15:58 vendor

Open in new window


My PHP Version PHP Version 7.0.16-3 , is it Possible not compatible with my version ?
Colin_UK

Hi Mehran,

Could you post an ls -al so it shows the permissions of the directory too please?
And it's not +x you need, that allows the file to be executed, you need w for write permission, and it's not city.php that matters, ity is the directory where city.php wants to create a file called newfile.txt

Colin
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Jeff Darling

If it couldn't write the file, then you should see a message like this:

Unable to open file!

Open in new window


Might have to look at the php log.  Do you have logging enabled on the server and know the location?
Mehran Goudarzi

ASKER
/var/www/html# ls -al
total 44
drwxr-xr-x 5 root root  4096 Apr 11 22:57 .
drwxr-xr-x 4 root root  4096 Feb 23 15:38 ..
-rw-r--r-- 1 root root    22 Apr 11 22:57 check.php
-rwxrwxrwx 1 root root  1019 Apr 11 22:15 city.php
-rw-r--r-- 1 root root    75 Mar 30 15:41 composer.json
-rw-r--r-- 1 root root 11971 Mar 30 15:41 composer.lock
drwxrwxrwx 3 root root  4096 Mar 30 16:32 inst
drwxr-xr-x 3 root root  4096 Apr 11 13:52 login
drwxrwxrwx 8 root root  4096 Mar 30 15:58 vendor

Open in new window

Colin_UK

Hi,

I'd say this is the problem.
drwxr-xr-x 5 root root  4096 Apr 11 22:57 .

It does not allow writing to the current directory by any user other than root.
As far as I know RHEL will switch to nobody or apache user once it is loaded.

The easiest solution is to change the directory to 777 although that is not a secure solution (depends if the web service is public).
You could create a directory called data and make it open ie: "chmod 777 data" then change the PHP code to wite to data/newfile.txt instead.

Another option is to create a file called newfile.txt and remove the security from that 1 file (will not work if you subsequently delete that file).
ie: (you will need to be root to do this)
touch /var/www/html/newfile.txt
chmod 777 newfile.txt

Hope this helps
Colin
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
Ray Paseur

I hope someone else can help with your file permissions.

This seems to work for me.  It throws a notice over a piece of missing data, but it seems to extract the information from the table.
https://iconoun.com/demo/temp_mehran.php
<?php // demo/temp_mehran.php
/**
 * https://www.experts-exchange.com/questions/29015495/curl-parse-data-from-site.html
 */
error_reporting(E_ALL);
echo '<pre>';

// A CLASS TO REPRESENT INFORMATION ABOUT A CITY
Class City
{
    public $rank, $name, $province, $founded, $pop2011;
}

// A COLLECTION OF EXTRACTED DATA
$cities = [];

// READ FROM THE WIKIPEDIA
$url = 'https://en.wikipedia.org/wiki/List_of_Iranian_cities_by_population';
$doc = file_get_contents($url);
if (!$doc) trigger_error("Unable to read $url", E_USER_ERROR);

// ACTIVATE THIS TO SEE THE HTML DOCUMENT
// echo htmlentities($doc);

// TRIM THE DOCUMENT TO ISOLATE THE INFORMATION WE WANT
$sig = '<table class="wikitable sortable">';
$doc = explode($sig, $doc);
$sig = '</table>';
$doc = explode($sig, $doc[1]);
$doc = $doc[0];

// TIDY UP SOME UNRULY TAGS
$doc = str_replace('<td align="center">', '<td>', $doc);
$sig = '<tr>';
$trs = explode($sig, $doc);
unset($trs[0], $trs[1]);

// PROCESS THE DATA IN EACH TABLE ROW
$sig = '<td>';
foreach ($trs as $tr)
{
    $tds = explode($sig, $tr);
    $city = new City;
    $city->rank     = trim( strip_tags($tds[1]) );
    $city->name     = trim( strip_tags($tds[2]) );
    $city->province = trim( strip_tags($tds[3]) );

    $city->founded  = trim( strip_tags($tds[4]) );
    $city->founded  = substr($city->founded,0,4);

    $pop2011 = explode('</span>', $tds[5]);
    $city->pop2011 = trim( strip_tags($pop2011[1]) );

    $cities[] = $city;
}

// SHOW THE WORK PRODUCT
print_r($cities);

Open in new window

Mehran Goudarzi

ASKER
# ls -al
total 44
drwxrwxrwx 5 root root  4096 Apr 11 22:57 .
drwxr-xr-x 4 root root  4096 Feb 23 15:38 ..
-rwxrwxrwx 1 root root    22 Apr 11 22:57 check.php
-rwxrwxrwx 1 root root  1019 Apr 11 22:15 city.php
-rwxrwxrwx 1 root root    75 Mar 30 15:41 composer.json
-rwxrwxrwx 1 root root 11971 Mar 30 15:41 composer.lock
drwxrwxrwx 3 root root  4096 Mar 30 16:32 inst
drwxrwxrwx 3 root root  4096 Apr 11 13:52 login
drwxrwxrwx 8 root root  4096 Mar 30 15:58 vendor

Open in new window


also same i see blank browser .
Mehran Goudarzi

ASKER
i dont think it Permission issue , your code is now work and same result
https://iconoun.com/demo/temp_mehran.php

Open in new window


something is on code don't allow to run it . anyway how can i have list just For City ?
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
ASKER CERTIFIED SOLUTION
Ray Paseur

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
Mehran Goudarzi

ASKER
Thanks .