• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 343
  • Last Modified:

checking if an image exists using regular expressions?

I have an images directory containing lots of property photos. Each of these photos have different naming formats but essentially take this form
prefixPROPERTYIDsuffix.jpg
The propertyid is usually 5 or more digits long, and there may or may not be a prefix or a suffix.

Here are some examples:

138671a.jpg
138671b.jpg
138672a.jpg
clsc24900d.jpg
s339447_701_14.jpg
s339447_801_22.jpg

In displaying a property photo on my website, I want to check if a photo exists for that property. If a photo doesnt exist, I display a generic "photo not available" image instead. I also want to know if there are multiple property photos available for a property (see example above).

My current solution to this problem is this:
-----------------------------------------------------------------------------------------------------
# $listing is the 5-6 digit propertyid variable
# $config['imagepath'] is the path to my images folder
# $config[imageprefix] is the possible prefix for an image - it may or may not exist. The imagesuffix is tough to predict.

exec("ls -l ".$config['imagepath'].$config[imageprefix].strtolower($listing)."*.jpg 2> /dev/null",$output);
for ($i=0; $i < sizeof($output); $i++) {
   if (eregi("/photos/(.+\.jpg)",$output[$i],$found)) $filearray[] = $found[1];
}
if (sizeof($output) > 0) {
   $propertyphoto = $config['imagepath'].$filearray[0];
   if ($filearray[1]) $multiplephotos = "yes";
} else {
   $propertyphoto = $config['imagepath']."photonotavailable.jpg";
}
-----------------------------------------------------------------------------------------------------

This solution works great. HOWEVER, it is very slow considering I have a photos directory containing almost 2gigs of photos, and my page displaying properties and their photos takes 5-10 seconds to load.
I was thinking my only other alternative is to use the file_exists() function but I dont think I can use regular expressions with that function? If I hardcode in the full photo names into a file_exists() functions, my page loads instantaneous - it doesnt have to do a list on the image folder.

Has anyone got any other ideas/approach that I could try, or that could speed up my image query?
0
eastwop
Asked:
eastwop
  • 11
  • 9
  • 2
  • +2
1 Solution
 
gruntarCommented:
Yes, scaning whole directory is not good at all. There are two alternatives.

1. When you insert property into database you get porperty ID. Use that ID to create folder and put images into that folder. then when you fetch property you check if folder exist (eg. /www/var/images/2332) and if then you loop throug that folder only (that will be quick).

2. Create one more table in your database and put property id and image filenames when inserting property.

property_id, filename
12    ,  somename.jpg
12    ,  next2one.jpg
152  ,  dsdsd.jpg
...

Cheers
0
 
RoonaanCommented:
Wouldn't it be easier to use the is_file() function?

if(is_file($dir.'/'.$prefix.$id.$suffix))
  echo "File exists";
else
  echo "File does not exist";

Usage of exec I think isn't the appropriate way. This limites the portability of your scripts as for example they will never run on windows based servers. Php provides numerous directory and file functions in the manuals http://php.net/dir and http://php.net/filesystem; it would be a good teaching to walk through those two references and see what php has to offer.

Regards

-r-
0
 
aib_42Commented:
I was going to suggest simply grouping photos into directories by their property ID. The disk space overhead this will cause should be roughly:

(cluster/inode size) * (number of photos) / (average number of photos per property)

Assuming you have clusters of 4k, your average photo size is 30k and therefore you have around 70,000 photos, and you have 1.3 average photos per property...

You should need about 210 Mb of extra space, which is around 10%, and nothing compared to the speed gain.
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
aib_42Commented:
You cannot use regular expressions with file_exists(), but if you have a hard-copy list of the names of files, you can do a regex match on it. I still recommend some method of dividing the images, however, such as grouping them by the first digit of their ID.

(However, if you use my suggestion above, you can instantly check if a photo exists by checking if the directory having its ID exists.
0
 
eastwopAuthor Commented:
Gruntar, I appreciate the 2 ideas but I dont think either of them are feasible for my situation and heres why.
I download raw property data each day in a delimited text file (one of them being 14mbs in size) and I do a LOAD DATA command to import the text files into my mysql database. I have to do some extra processing to add dates for all new properties being entered into my database....this takes a good bit of processing time. Adding an extra step of updating another table with property id's and image filenames is a step I would prefer to avoid.....sometimes there are issues with the data import, and its time consuming enough making sure the data is in ok shape without worrying if an image lookup table is in ok shape as well.
The other idea of creating subfolders for each property is a bit awkward given my situation - I download all the property photos in one big zip file each day - that in accordance with the text files processing mentioned above would be a pain.

In actual fact I am importing property data from 6 different property systems......and they provide their photos in different formats, some in .zip files, some in .gz and some with subfolders in the zip files. And of course the imagenaming is different for each.

Any other approaches? Perhaps there is a more elegant unix command I could use besides "ls"  to accomplish what I want?

Thanks again for your help!
0
 
eastwopAuthor Commented:
Roonaan,
I think I run into the same problem with is_file() as with file_exists().....I cant use regular expressions with the filename?

aib_42, I realise that splitting the photos up into subdirectories would solve my problem, but Im a little worried about going down that route. It means I when I am unzipping the photo zip files, I must perform a few more steps - doing this for a few different format zip files right now, and with my plans to include more property systems in the near future, and not knowing what formats they may have, I fear the limitations/headaches. Lets just say Ive spent a long while perfecting automated generic approaches to handling my data importing and photo extraction.

I have thought about this hard copy text file with a list of all the file names. Not sure how big that would be to describe 2gigs of photos....perhaps a 20mb file? Would it be too intensive to do a grep on that file each time I want to check if a photo exists? I could create this text file once a day? I presume doing an fopen, ereg, fclose each time on this file would also be too intensive.
0
 
eastwopAuthor Commented:
Or what about this approach.
Once a day, dump an "ls" of the whole photos directory into a delimited text file containing just the name of each file - then to do a LOAD DATA import into a mysql table......and then I would have a faster time checking if a file exists, running a query where filename like "%$propertyid%"???
0
 
gruntarCommented:
That approach will get you off the hook for a while. Meanwhile you should think of something that will be easy to maintain. I can't help you because i am not familiar with the system.

Regards
0
 
_GeG_Commented:
There is a unix command you might use: locate
first tell locate to build a database for the path where your pictures are:
<?php
exec('locate -U /path/to/pictures -o databasefile');
?>
now when you are looking for an image with properties 339447:
<?php
$properties=' 339447';
$file=exec("locate -d databasefile '*$properties*'");
if ($file){
    echo "The file: $file";
} else {
    echo "Not found";
}
you should do the database build after each update, or with a cron job
0
 
_GeG_Commented:
btw you must have write access to the file databasefile ;)
But you can give databasefile it any name or path
ie
locate -U /path/to/pictures -o /var/web/jonny/www/mylocatedatabase
is also valid
0
 
eastwopAuthor Commented:
GeG, I like your solution - elegant with little change needed to my current system.

I ran locate on my photos directory - it created a 1mb file.
Then if I run the locate -d command from the command line, looking for a propertyid in the file, it successfully returns all the files that include that propertyid.
However when I try to run the exact same command from exec() in my php script, it doesnt return any output?

I have made sure to give the full path to locate (/usr/bin/locate), and the full path to my locate database file. I also tried using the $output option in exec() which would return an array of all outputted lines but the array is always empty. To eliminate possible permission issues, I made the database file 777 and the directory the file was in 777.
I tried using passthru() and system() but with no luck.
I checked my apache error logs and there are no errors showing up. When I put in an incorrect path to the locate database file, an error does show up in the apache error log.

Any ideas of what I may be doing wrong?

I will split the points if someone else besides GeG helps me solve my issue with no output showing up - GeG has set me on the right track with the locate approach.
0
 
_GeG_Commented:
there are a few points that could be the problem
locate returns nothing if it finds nothing, typical unix behaviour
so either you find nothing, because you are looking for a wrong file
   easy to check:
    <?php
   $properties=' 339447';
   $code="locate -d databasefile '*$properties*'";
   echo "$code\n";
   echo exec($code);
   ?>
   if this echoes 2 line, everythig is fine. if not just copy and paste the output of echo $code to the bash prompt and look what happens.
   I had this problem once when there was a white space in the filename, that's why i put the ' in locate command line
or you cannot execute programs
   if
   <?php
   echo "ls -la";
   ?>
   outputs something, this is not the problem

hth
0
 
eastwopAuthor Commented:
I have been echoing the command being passed to exec() in the php script, and then copying and pasting it to the bash prompt. In the php script, it returns no output, at the bash it does return 4 lines of output (found 4 files). And Im really doing a simple test exactly like you have above with a propertyid that i know exists in the locate database.

When I run an "exec("ls -al /filepath"), it does return output - I use this a lot and havent had any problems.

It would be nice if exec() returned more information.
0
 
eastwopAuthor Commented:
Has anyone any further ideas on why running locate within exec() in a php script wont return any output, while it will work fine if running the same locate command at the bash prompt?
0
 
_GeG_Commented:
Sorry for the late reply...
I am pretty sure now that it must be an access rights problem
first check if a exec('ls /path/to/image'); returns anything. If no check the access rights for the images and the directories below, for the directories, you must have the x flag set.
Then try the script like this:
<?php
exec('locate -u -o databasefile');
echo `locate -d databasefile '*'`;
?>
be careful, this will be a very long list, everything that php/apache has access to. If you view it with a browser, view the source to make it readable. And now check where the path to the image directory disappears. At this point there is the rights problem ;)

I wonder it we found it this time :(
0
 
eastwopAuthor Commented:
I tried running what you said, and it did product a long list of files on my system. However it didnt include files in my home directory for any of the domains included there? The only mention of the domain where I ran the script from, was in the etc and var directories for example (valiases etc).

However in my own testing, I think I am a little closer to finding out what the problem is - taking php and exec() out of the equation.

1. At the command line, I logged in as a specific user (not root). I created a folder called test. In that folder I created a subdirectory called 'photos'. In that folder I placed 10 property photos (eg 397522.jpg 397523.jpg).

2. I did a chown -R user:user on the test directory and all files beneath it. I did a chmod -R 777 on the test directory and all files beneath it.

3. At the command line, and while in the test directory I type:
    locate -U photos -o photosdbfile

4. Indexed file gets created successfully - I make the file 777 and then type:
    locate -d photosdbfile '*397*'
but nothing gets returned.

5. If I log in as root and cd to this test folder and run the exact same command, it does find files with 397 in their path.

My conclusion is that a user on my server can use 'locate' to create an indexedfile, but doesnt have permission to search for files in that indexed file?? Ive tested the above scenario using a php script to create the indexed file and then search it, but got no output.

Perhaps if I changed the permissions related to how locate works, and allow just one regular user have the ability to search indexed locate files? Not sure how I would go about that though.

0
 
_GeG_Commented:
let's say your image directory is /home/test/photos
then make a chmod -R a+x /home
and try again.
It is very important that all dirs below your image directory have the x flag set for everyone
0
 
eastwopAuthor Commented:
I have done this, making the users folder(/home/username/) and all subfolders a+x. I have confirmed with ls that this is the case. When I create the locate indexed file, I run chmod a+x on it so that it is executable by group or world.  But I am still getting no output when I run locate -d on the indexed file while logged in as a certain user....but I do get output with the same command run on the same files as root. Im thinking the issue has to be a system permission governing users using locate?
0
 
_GeG_Commented:
I don't know. I use gentoo, and it works like described. What distro are you using?
btw is for /home also x set?
0
 
eastwopAuthor Commented:
Here are my server settings
Linux 2.4.21-20.ELsmp #1 SMP i686 i686 i386 GNU/Linux
Apache/1.3.31 (Unix) mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.3.9 FrontPage/5.0.2.2634a mod_ssl/2.8.20 OpenSSL/0.9.7a

Accounts are handled by cpanel - perhaps that has something to do with it? If I knew of a conf file for locate I could try tweaking that.....but I dont see anything. I did a google search but it didnt come up with much to do with my issue.
0
 
eastwopAuthor Commented:
And yes, home has drwx--x--x as its permissions
0
 
_GeG_Commented:
sorry, i give up. my internet server's hard drive has just crashed, and now i have to work and answer the telefone, and tell everybody it's going to be alright, soon???, sigh...
0
 
eastwopAuthor Commented:
I finally got it working. In creating the locate indexed file I had to turn the security level to 0
using the -l option for locate.

e.g. locate -U photos -l0 -o photosdbfile

I will award half the points as promised to _GeG_. Thanks!
0
 
_GeG_Commented:
normally this is not the recommended solution, because it doesn't check access rights. I still think that there is an access rights problem somewhere in your picture path. But I am glad that it works now for you. And I will remember -l0 in case of problems ;)
Good luck!
0
 
_GeG_Commented:
ok :D
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 11
  • 9
  • 2
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now