image database design and implementation

I am supposed to develop a web-based application for a real estate company, that includes a lot of pictures. My question is how to organize the pictures: database or filesystem? I have read somewhere that the filesystem is a more appropiate solution, does anyone have some experience with that? Can anyone give me some good reasons why I should proceed this way?

Furthermore: how should I organize the directory structure? A real estate company has more offices, each office employs more agents. One agent has more listings (houses to sell/lease). One house has a main picture, up to 24 additional pictures (a slideshow) and maybe a virtual tour/panorama. Each image should be available in different resolutions (2 or three). What is the best directory hierachy to store all these images? (The application will be developed in php+mysql and will probably run on a Linux box with an Apache server)

Any piece of advice concerning the design of this application is welcome.
Who is Participating?
minnirokConnect With a Mentor Commented:
I wouldn't put all my pictures in one basket - having every picture in a single directory means you're relying exclusively on the database to determine which property each picture belongs to.  This isn't necessarily a problem, but I prefer putting my socks, pants, and shirts in seperate draws rather than in a single big pile.  If you use a subdirectory for each property, the filesystem itself will already have within it all relevent photos without a need to traverse a single giant directory for the correct file.

Zastil, regarding the speed issues with 1 directory vs many; for the example you gave in terms of pulling out the top 5 properties, the processing required to change traverse directories is negligable.  I'd compare it to complaining about having to open draws to get dressed and prefering to just throw on yesterday's clothes.  The whole purpose of directories is to help organize, so you shouldn't be afraid to use them.

Of course, you wouldn't want so many subdirectories that they get in the way, like putting each sock in a seperate drawer.  For instance your suggestion for an image being listed as:


appears too conveluted to be of help.  For instance, if your agent changes office, you'd need to completely rearrange the file structure.  Because images belong DIRECTLY to the PROPERTY, you should use the subdirectory of the property code directly:


Each property code would be stored in the database as a unique ID.  You would use the database to assign the appropriate office code and agent code, which allows alot more flexability with the system; you'll simply have to adjust values in the database rather than messing around with the file system.

As an example of some of the database tables you might have:

officeID (PK)

agentID (PK)

agentID (PK composite)
officeID (PK composite)

propertyID (PK)

imageID (PK)
propertyID (foreign K)
resolution    enum('L', 'M', 'S')

This assumes that an agent may have multiple offices, otherwise you would get rid of tblagentoffice and simply list officeID as a foreign key within tblagent.

You could name each image within the subdirectories for properties according to their imageID.  So for propertyID #3005 (ie property code)


would hold all the relevent images for that property;

each image could in turn be named after their imageID:

This of course is just one possible implementation.  It would be more efficient to use a composite key to identify the image names; perhaps combining the three fields to form an identification for the image:

imageID (PK)
propertyID (PK composite)
resolution   (PK) enum('L', 'M', 'S')

so you could number the first image for property #3005 simply as #1 (or you could name it) and append the resolution to the end of it: so for your first image of a large resolution (L) you would store is as:


Again, it's just takes time figuring out how you want to implement the system.  Hope you consider using subdirectories, I'd recommend using them whenever they can help organize files.  Zastil's system would also work; I just would cringe opening that image folder on a windows machine!
Yes you are correct use the filesystem to store ya images with the reference in you database to those images.

Keep all the images in the same directory as sub-directories can get messy trust me I have done many a real estate website.

As far the the images name then in its simplest form it could just be the unique image reference from the database for the name i.e. 1.jpg, 2.jpg and so on.
Table structure I use for images is

id_image ' Unique column
id_property ' Property ID related to the image
img_name ' if you wish to use a name else can leave this column and just use the id_image for the name
order_by ' You agents will likely require this as once uploaded may wish to bump images up in the way they appear in the gallery
description ' I use this for the image mousever and also img meta tag.
Easily Design & Build Your Next Website

Squarespace’s all-in-one platform gives you everything you need to express yourself creatively online, whether it is with a domain, website, or online store. Get started with your free trial today, and when ready, take 10% off your first purchase with offer code 'EXPERTS'.

My company uses a third party workflow system that stores lots of scanned and faxed images.  I agree with zastil that saving the images on the file system is the better approach.

We started out saving the images to a single directory and then switched to saving them in a separate folder by date.  I like having the date folders better.  When you have thousands of images and you need to access one of them, it takes forever to open the single folder with all those images.  It also is easier to find a particular image when you have them segregated by create date.  I don't have to view the images from the file structure that often, but when I do it is much easier with the date folders.  In addition, if you don't need to keep the images forever, you can create an archive or clean up process that simply deletes the date folder instead of having to delete the individual images.

As far as naming, this product uses the date as part of the image name and a sequential suffix.  
The filesystem approach to image-based databases is widely used for traditional applications, and would work well for your real-estate project.  In terms of a performance approach, having alot of binary data being returned from the database can slow things down.  Plus, it may make you feel easier seeing a file resting in a directory structure rather than sitting as a BLOB in a mysql database.

In terms of a design perpective, however, storing images in a database is the more elegant solution since it better ensures syncronicity between data stored about the image and the image itself.  For instance, in a file system, the image might be inadvertantly modified or deleted while the data stored about that image remains unchanged in the database - meaning the database could be inaccurate.  If the image itself is stored within the database, this is much less likely to happen.

The database implementation should be perfectly fine for storing a stock thumbnail for each property.  It's only when you have masses of photos and potential user requests all at the same time that you'd be straining your mysql server's capabilities.

as an example of retrieval of an image from a database using php/mysql, if I had an image in my database assigned the ID of 3, I could display it through php with something like:

$query = "SELECT imageID, data FROM tblimage WHERE imageID = '3';";
$result = mysql_query( $query, $db_link );
header("Content-type: image/jpeg");
echo mysql_result( $result, 0, "data" );

This is using a table tblimage with an integer imageID and a BLOB, data, which stores the binary data of the image file.  Because of the header declaration, the page itself renders as a jpg image.  You simply call the image_print.php file from the source attribute of an image tag from a layout page:

<img src="image_print.php?id=3" />

and a query string can be used to allow the image_print.php page to dynamically disiplay any image requested.  In this example it's only hardcoded.

If you decide to go w/ the file system, I'd recommend creating subdirectories for each property, named after the property's ID in the database, with all the relevent photos stored within the appropriate dir named after their own unique ID#s.  This sort of filesystem design would probably be the best choice for your project.
muntelAuthor Commented:
Thanks, guys! Yes, I will proceed with the filesystem approach, as most of you suggested.

Zastil, I don't think that the "one single folder" perspective will be very helpful. The whole thing will eventually grow very large and therefore needs to be more organized, so i think that I woould be better off with minnirok's suggestion about separate folders for each property.

Lori, it seems to me taht the date approach it not very relevant to my project, although you have a good point there: it might be very appropriate for e-commerce or related areas.

Minnirok: good and elaborate suggestions. One thing though: I doubt that the <img src="image_print.php?id=3" /> is advisable. For instance, if an agent wants to browse the pictures assigned to a house (up to 24), then image_print.php would get called a lot of times (one time per image), which would be an overhead for my server. i have developed a script that takes the directory as parameter and outputs all the images within it on the webpage.

I want to create a directory structure that looks something like this: /base_image_dir/officecode/agentcode/propertycode/image_file.jpg. It reflects the logical structure of my application (an office emplyes more agents, an agent has more listings etc.) However, do you have any pieceof advice on how to cope with the different image resolution thing? I am supposed to upload and store each image in different sizes/resolutions (about 2, maybe 3).

Any idea how to reflect that in my directory structure? (just this one last question and I'll give away to well-deserved points)

Zastil, I don't think that the "one single folder" perspective will be very helpful. The whole thing will eventually grow very large and therefore needs to be more organized, so i think that I woould be better off with minnirok's suggestion about separate folders for each property.

Okay, Just to let you know i have over 10,000 images in 1 directory and no issues. You refer to you images in code and never yet even looked in that directory to find a file. If you image managing system is written well you will not get any problems. Also sure I read somewhere that having them all in same directory speeds up the access to the images (i am not sure on this though).
Just to elaborate more on the speed issue with 1 directory.

If you was bringing out the top 5 properties for example simply pull em straight from the 1 directory. Else the system would be jumping back and forth into directories thus consuming more processing time?
muntelAuthor Commented:
good point there, zastil. I think I will adopt the "one dir" approach for the main pictures of the properties. (each house has one "main" picture and up to 24 additional ones). In that way, the searches issued by users will be fast (i.e. browsing just one dir) and will bring up a short description of the houses that match the search criteria, along with the main picture/thumbnail of each house. clicking on a house will bring up a pop-up window that will display the full profile of the property, i.e. with the additional pictures in a slideshow (again, browsing just one dir, the one that corresponds to the property in my dir structure)

so, maybe a hybrid approach would suit me better, don't you think?

again, I need more suggestions on the different resolution/sizes issue. Any ideas, guys?
I think your hybrid approach sounds very good.  You gain the performance benefits by having your main pictures in a single directory and then if someone wants to drill down on a property you can go to its individual folder.  And you are right, segregating by property code is a much better approach than date for your application.

For naming for the different resolutions, I would probably go with a standard suffix you could append to the file name to indicate the resolution.  I.E., img1res1.jpg, img1res2.jpg, img1res3.jpg.  Or, you could go with a completely separate file structure for each resolution.  I.E., 3 folders for the "main" pictures, one for each resolution.  I'm assuming that once a person has chosen a resolution, they wouldn't change mid-stream, so that wouldn't cause too much jumping around inside your file structure.
muntelAuthor Commented:
Yes, Lori99, that would probably be a good idea. As for the filenaming convention, I will add a timestamp as part of the filename, so there are no naming conflicts and no files get overwritten.

A good idea was suggested by zastil earlier: to keep an index of the order that the files appear in a slide show and some description of the image.
if you are using the id_image then you will not get any naming conflicts, but yes no problem in using a timestamp also if you wish.
Or how about this.  If you added a resolution column to your property_images table, you could use that to locate the image with the proper resolution and go straight to it.  Then you could keep all images for a property regardless of resolution in a folder named by property code and you wouldn't need to have any intelligence in the naming.  Here's how I think the table definintion would look.


image_type (main image or additional images)
additional columns as suggested by zastil earlier
a lot of this will depend on volume.  we store document images in the realm of millions of documents and several tens of terabytes of physical storage.  having architected the move from a single directory system....i can tell you first hand the downfalls there are in this kind of approach in _some_ types of environments.  if the web site is literally the only thing pulling the images, then it's probably okay - outside of the fact that you may actually be testing the physical boundaries of a filesystem.  so the question then becomes, what is the maximum number of files you can store in a single directory on the given platform?  if you have inside people working the docs/images, then you have to ask how much you trust them and their capability and grant access/permissions accordingly which requires the appropriate segmentation. (ie. Glenda didn't know she was deleting images for the entire company, she thought those were all her images).  if the management portion is web-based then such an enforced segmentation at the filesystem is a good thing from a security point of view; but it would need to match whatever functionality is needed.  third, if you are required to perform document/image maintenance, especially anything customer driven.  ie. batch archival/indexing, then the organizational approach becomes apparent, especially if your sql service is already burdened to any degree.  if your sql server and your fileserver are one in the same....the seek time alone, repetitively through a large number of images in a single unit of isolation would kill performance.  it's a fact of computer science: consider o/ring classifications, where a single or a few jumps to a significantly smaller set to be searched/iterated through is far less expensive than iterating through the entire set; thus the desirability for organized structures like trees over arrays.

integrated images work extremely well until you hit that boundary.  do the math.  if you're never going to exceed n images at an average of z size, and that, coupled with the other data is within the bounds of the abilities of the rdbms - then i would seriously consider the option.  the trick there is knowing the upper limit on the number of images you will keep, and then sticking with it.

the image wrapper is not that significant an overhead and allows for greater functionality (especially when security objectives are taken into account) _and_ actually assists you in your last question on resolution.  ie. client end setting for desired resolution applied at a point of fetching an image.  this process is then transparent to the user outside of the actual static setting, and can be overriden if needed for a direct query for a specific image resolution.


database filesystems are the next generation of integration that would probably assist you.  filenet is a pretty standard early-release/low-integration product.  it's more suited to documents than images, and isn't cheap.  but it's value is unparalled to those who make use of it.  microsoft will have an integrated database filesystem in longhorn, which i realize is not the target platform - but it's where things are headed for these kinds of solutions.  but since you are using linux, have you ever considered creating a separate partition of custom type with a modified file structure to hold all of the fields necessary?  we created a modified version of bsd's ffs for a web caching solution that made use of file metadata in a similar fashion and it was a huge success in terms of efficiency/consistency between physical storage and everything that pointed to it (because it was all one in the same).  just an idea.
"Zastil, regarding the speed issues with 1 directory vs many; for the example you gave in terms of pulling out the top 5 properties, the processing required to change traverse directories is negligable.  I'd compare it to complaining about having to open draws to get dressed and prefering to just throw on yesterday's clothes."

i'd compare it to having to go through your socks one by one until you get to your underwear where you again go through it one by one then getting to your by one, and finally hitting the desired by one until you get the right one.  "speed" isn't an advantage for the singular container model, it simply doesn't scale.  as opposed to going to the shirt drawer first....because that's where the shirts are.

Ok Droby10 have 1,000 people try and get those clothes out your draws better they was all in a line than all trying to bundle into you draws! haha
muntelAuthor Commented:
That was a very elaborate answer, minnirok, thanks for everything.

Yes the idea of putting images in propertycode folders was the first that went through my mind, but the problem is that the number of properties will be large. I wonder if there's a maximum number of subdirs in a dir under Linux (sorry, I'm not a Linux freak... my application will run on a Linux Red Hat 8 machine, I'd better check out this maximum dirs thing first).

That is way I thought of separating the property codes under agents folders, and agents under offices forders. You're right, if an agent switches offices, there might be some overhead moving his folder to the new office's folder, but this things do not occur very often, so...

Anyway, you sure do deserve the points, I'll accept your answer.

Many thanks to everybody!

"Ok Droby10 have 1,000 people try and get those clothes out your draws better they was all in a line than all trying to bundle into you draws! haha"

sounds more like government office politics than sound computing to me....but what do i know?

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.