collect & store remote images from suppliers - PHP

HI All,

my supplier lets me have a large csv file which lists all of their products. I use this information to populate my web site.

one column in that csv file is the location of the product image, it looks something like this:
http://www.my_supplier.com/images/prodID_12345_t.jpg

please take note of the last bit where it says '_t.jpg'

what i'm looking for is solution that will:

search through the newly saved csv file looking down the images column and then collect and save images storing them into dir's appropriate to the product category.

so images would be stored like so:

/images/microsoft
/images/epsom
/images/hewlett_packard    etc etc ect

just to add another complication the url that is given to me needs to be changed from:

http://www.my_supplier.com/images/prodID_12345_t.jpg

to:

http://www.my_supplier.com/images/prodID_12345_L.jpg

notice how i have changed the 't' to an 'L'

be doing so this will give me a large images rather than the thumbnail image.

all help appreciated.

thanks

zac
LVL 1
bede123Asked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

karoldvlCommented:
Something like this should do the job.

There are some caveats here though:
1) It's rudimentary - I assume there are no errors in the CSV file and the file itself is safe - no validation or error checking at all.
2) You have to set $fields[1] and $fields[2] to corresponding columns in the input file.
3) The script assumes you have directories for the categories in place. If they're not there, it will fail.

What's more - if you have big files to process, you need to split them into smaller chunks so that you don't exceed the execution time limit.
<?php

$csv = file('data.csv');

foreach ($csv as $line) {
	$fields = explode(';', $line);
	
	$category 	= $fields[1];
	$image		= $fields[2];
	
	$image = str_replace('_t.jpg', '_L.jpg', $image);
	$file  = preg_replace('/(.*)\/(.*)\_L\.jpg/', '\2_L.jpg', $image);
		
	copy($image, $category.'/'.$file);
}

?>

Open in new window

SwafnilCommented:
Could you give the following script a try? simply replace the variables in the header and your good to go ;-)
<?php
$strCSVfile = "PATH/TO/CSVFILE.csv";
$LocalPicturePath = "C:\\Inetpub\\wwwroot\\catalog\\images\\";
// read the content of the csv
$contents = file_get_contents($strCSVfile);
// split the contents into an array to iterate the lines
$arrLines = explode("\n", $contents);
foreach ($arrLines as $line){
	// split the line into "cells"
	$arrCols = explode(";", $line);
	// assume your image is the 4th entry, the category is the 2nd entry; PHP has zero-based arrays, so we need $arrCols[1] and $arrCols[3]
	$category = $arrCols[1];
	$imagePath = $arrCols[3];
	// create local files
	if (!file_exists($LocalPicturePath.$category)){
		if (!mkdir($LocalPicturePath.$category)){
			print "Failed to create directory $category in $LocalPicturePath!<br />";	
		} else {
			print "Folder $category created<br />";
		}
	}
	// exchange t and L
	$remoteFilePath = substr($imagePath, 0, strlen($imagePath)-5)."L".substr($imagePath, strlen($imagePath)-4);
	// retreive the filename (last entry after the slash)
	$arrFilePath = explode("/", $remoteFilePath);
	$filename = $arrFilePath[count($arrFilePath)-1];
	// download the file
	if (file_put_contents($LocalPicturePath.$category."\\".$filename, file_get_contents($remoteFilePath))){
		print "Saved $filename in folder $category<br />";
	} else {
		print "Failed saving $filename in folder $category<br />";	
	}	
}
print "DONE.";
?>

Open in new window

SwafnilCommented:
@caroldvd: in Germany I would say "Mist" because you were not only faster, your script is even shorter :-)
There's only one bug, you didn't give a path to the save directory of the files so they would be added to the script's folder itself and I'm not sure if copy automatically creates folders if they don't exist ...
Introduction to Web Design

Develop a strong foundation and understanding of web design by learning HTML, CSS, and additional tools to help you develop your own website.

karoldvlCommented:
It's #3. I assumed they go into the webroot.

To be sure:
copy($image, 'path/to/the/images/'.$category.'/'.$file);

And as said in #3, copy won't create the directories. Dirty way to do it:
mkdir('path/to/the/images/'.$category);

But I hope it's just one use (batch import). For production (constant) use it would require a lot of polishing.
bede123Author Commented:
thanks for the input guys. ime just trying this out now. i'll get back asap. thanks

bede123Author Commented:
ok nearly there i feel....

i get this:

PHP Fatal error: Allowed memory size of 20971520 bytes exhausted (tried to allocate 1305 bytes) in E:\Domains\domain_name\wwwroot\files\fetch_and_save_images.php on line 7
bede123Author Commented:
i put this at the top of the page:

ini_set("memory_limit","120M");


it seems to be working. is that ok?

bede123Author Commented:
ok think we're getting somewhere.

the error now is becuase we are putting the 'L' in the wrong place.

here is the actual url to the image/s:
http://domain.com/dev/6/9/04950196/t_0495019L.jpg

you see we are putting the 'L' at the end but we need to replace the 't' with the 'L'

?




karoldvlCommented:
So, does your input URL look like this:
.../t_0495019L.jpg

and you want this:
.../L_0495019L.jpg

Is that correct? Or something else?
bede123Author Commented:
yes that is exacty correct
bede123Author Commented:
erm... NO, sorry without the 'L' at the end
karoldvlCommented:
Please see if changing those two lines works:
      $image = str_replace('/t_', '/L_', $image);
      $file  = preg_replace('/(.*)\/L_(.*)\.jpg/', 'L_\2.jpg', $image);
bede123Author Commented:
sorry, but do you mean i should replace something with those two lines or just add those two lines in somwhere?
karoldvlCommented:
The whole script with all the changes:
<?php

define('IMAGEDIR', '/path/to/the/images/');

$csv = file('data.csv');

foreach ($csv as $line) {
        $fields = explode(';', $line);
        
        $category       = $fields[1];
        $image          = $fields[2];
        
        $image = str_replace('/t_', '/L_', $image);
        $file  = trim(preg_replace('/(.*)\/L_(.*)\.jpg/', 'L_\2.jpg', $image));

        mkdir(IMAGEDIR.$category);
        copy($image, IMAGEDIR.$category.'/'.$file);
}

?>

Open in new window

bede123Author Commented:
just been doing a little more testing with this.... I've run the script a couple of times and by looking at my server via FTP I can see that all of the appropriate directories have been created however none of the images have been collected.

here is the code i'm using:

<?php 

ini_set("memory_limit","120M");


$strCSVfile = "E:\\Domains\\wwwroot\\files\\my.csv"; 
$LocalPicturePath = "E:\\Domains\\wwwroot\\images\\"; 
// read the content of the csv 
$contents = file_get_contents($strCSVfile); 
// split the contents into an array to iterate the lines 
$arrLines = explode("\n", $contents); 
foreach ($arrLines as $line){ 
        // split the line into "cells" 
        $arrCols = explode(";", $line); 
        // assume your image is the 2nd entry, the category is the 12th entry; PHP has zero-based arrays
        $imagePath = $arrCols[1];
		$category = $arrCols[11]; 
		
		 // exchange t and L 
		$image = str_replace('/t_', '/L_', $image); 
        $file  = trim(preg_replace('/(.*)\/L_(.*)\.jpg/', 'L_\2.jpg', $image)); 
		
        // create local files 
        if (!file_exists($LocalPicturePath.$category)){ 
                if (!mkdir($LocalPicturePath.$category)){ 
                        print "Failed to create directory $category in $LocalPicturePath!<br />";        
                } else { 
                        print "Folder $category created<br />"; 
                } 
        } 
        		
		 
        // retreive the filename (last entry after the slash) 
        $arrFilePath = explode("/", $remoteFilePath); 
        $filename = $arrFilePath[count($arrFilePath)-1]; 
        // download the file 
        if (file_put_contents($LocalPicturePath.$category."\\".$filename, file_get_contents($remoteFilePath))){ 
                print "Saved $filename in folder $category<br />"; 
        } else { 
                print "Failed saving $filename in folder $category<br />";       
        }        
} 
print "DONE."; 
?>

Open in new window

bede123Author Commented:
ok i have reverted back to using the code below because at least that gives me an error and i can see whats its doing wrong....

it seems to be removing the last digit on the image URL and replacing it with 'L'

but we dont want the last digit removed and the 'l' shouldnt be at the end it should be here:

http://dev.domain.com/dev/8/5/04906358/L_04906358.jpg

see where the 'L' is?


<?php 

ini_set("memory_limit","120M");


$strCSVfile = "E:\\Domains\\domain\\wwwroot\\files\\my.csv"; 
$LocalPicturePath = "E:\\Domains\\domain\\wwwroot\\images\\"; 
// read the content of the csv 
$contents = file_get_contents($strCSVfile); 
// split the contents into an array to iterate the lines 
$arrLines = explode("\n", $contents); 
foreach ($arrLines as $line){ 
        // split the line into "cells" 
        $arrCols = explode(";", $line); 
        // assume your image is the 4th entry, the category is the 2nd entry; PHP has zero-based arrays, so we need $arrCols[1] and $arrCols[3] 
        $imagePath = $arrCols[1];
		$category = $arrCols[11]; 
		
		// exchange t and L 
        $remoteFilePath = substr($imagePath, 0, strlen($imagePath)-5)."l".substr($imagePath, strlen($imagePath)-4);
		
		
			// retreive the filename (last entry after the slash) 
        $arrFilePath = explode("/", $remoteFilePath); 
        $filename = $arrFilePath[count($arrFilePath)-1];
		
        // create local files 
        if (!file_exists($LocalPicturePath.$category)){ 
                if (!mkdir($LocalPicturePath.$category)){ 
                        print "Failed to create directory $category in $LocalPicturePath!<br />";        
                } else { 
                        print "Folder $category created<br />"; 
                } 
        } 
        		
		 
        // download the file 
        if (file_put_contents($LocalPicturePath.$category."\\".$filename, file_get_contents($remoteFilePath))){ 
                print "Saved $filename in folder $category<br />"; 
        } else { 
                print "Failed saving $filename in folder $category<br /><br />";       
        }        
} 
print "DONE."; 
?>

Open in new window

karoldvlCommented:
You're mixing both scripts here. You have to stick with one. See if this works:
<?php 

ini_set("memory_limit","120M");


$strCSVfile = "E:\\Domains\\domain\\wwwroot\\files\\my.csv"; 
$LocalPicturePath = "E:\\Domains\\domain\\wwwroot\\images\\"; 
// read the content of the csv 
$contents = file_get_contents($strCSVfile); 
// split the contents into an array to iterate the lines 
$arrLines = explode("\n", $contents); 
foreach ($arrLines as $line){ 
        // split the line into "cells" 
        $arrCols = explode(";", $line); 
        // assume your image is the 4th entry, the category is the 2nd entry; PHP has zero-based arrays, so we need $arrCols[1] and $arrCols[3] 
        $imagePath = $arrCols[1];
        $category = $arrCols[11]; 
                
                
        $remoteFilePath = str_replace('/t_', '/L_', $imagePath);
        $filename  = trim(preg_replace('/(.*)\/L_(.*)\.jpg/', 'L_\2.jpg', $remoteFilePath));
                
        // create local files 
        if (!file_exists($LocalPicturePath.$category)){ 
                if (!mkdir($LocalPicturePath.$category)){ 
                        print "Failed to create directory $category in $LocalPicturePath!<br />";        
                } else { 
                        print "Folder $category created<br />"; 
                } 
        } 
                        
        // download the file 
        if (file_put_contents($LocalPicturePath.$category."\\".$filename, file_get_contents($remoteFilePath))){ 
                print "Saved $filename in folder $category<br />"; 
        } else { 
                print "Failed saving $filename in folder $category<br /><br />";       
        }        
} 
print "DONE."; 
?>

Open in new window

bede123Author Commented:
sorry, that was my attempt at 'trial & error'

ok so whats happening now is that its timing out. i thin this is a good sign becuase there a lot of images.

is there a way to limit it to say 100 just for a test?
SwafnilCommented:
Try changing line 20 from

[CODE]
$remoteFilePath = substr($imagePath, 0, strlen($imagePath)-5)."l".substr($imagePath, strlen($imagePath)-4);
[/CODE]

into

[CODE]
$remoteFilePath = str_replace("/t_", "L_", $imagePath);
[/CODE]

my first example would have exchange the "t" right before ".jpg" and I hadn't modified it yet; exchanging the "t_" with "L_" is much more practical with str_replace.

If you run into timeouts, add
[CODE]
set_time_limit (3600);
[/CODE]

somewhere in the first lines of the script to have your script run for 1 hour.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
bede123Author Commented:
hmm ok, i now have this in the head:

set_time_limit (3600);
ini_set("memory_limit","120M");

but it still times out. i also tried putting "" round the 3600 but that didnt make any difference.

i also changed 'L_' to '/l_' but that shouldnt make any difference o it timing ut right?

bede123Author Commented:
difference o it timing ut right?


should be

difference to it timing out right?
bede123Author Commented:
ok so i worked out that actually even though this it is timing out it is still collecting images. so i guess if we could make it so that it only attempts to download an image IF that image doesnt already exist then we might be able to make this complete the routine.

but in to be honest i have pushed this question long enough and i'll ask another question about the IF.

thanks so much for all your help its been great and tru;ly appreciated.

thanks
zac
bede123Author Commented:
hope everyone is ok with how i split the points.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
PHP

From novice to tech pro — start learning today.