How scalable is this?

Hello all
We have a box from which we ftp data.
The data is stored in the filesystem, in a hierarchy of directories and subdirectories, for example:
so to look up a file, you have to find the right product, and then navigate down to the minute the product was produced, and then parse to find the right file.
There is rarely more than a half-dozen of the same product that were produced at the same minute, and commonly only a few.

Some people want to use this box for a much higher volume
product, and there are concerns that such a lookup system won't scale, so I thought I'd ask the experts for their opinions.
The people who work with the box regularly say it can scale, but those who don't think that not using a modern RDBMS is ridiculous.
Thanks much for your input, and I'll also give at least the few best answers some points, too, maybe 50 each.
Who is Participating?
jlevieConnect With a Mentor Commented:
Well, since the access to the data is by FTP, the network and FTP service will limit the process first. The actual data transfer via FTP is the most efficient means of moving data between systems, but the overhead associated with getting from the initial FTP connect to the get/put is significant. Also you have to consider what limits the OS and hardware have, like max number of processes, max open files, installed memory, etc. Obviously, if there are lots and lots of FTP sessions active you could easily get into a situation where the server would start to swap which would significantly impact access time. And that's where a number of servers, each with a portion of the storage, has an advantage.
Where is the concern about scaleability? In terms of storage, access time, FTP server load?

It would seem to me that storage would not be an issue with this sort layout unless there could be more files located in a particular minute than were inodes on the disk that contained them. The layout of the storage lends itself to multiple mount points so even a humongous amount of 'product' could be easily handled.

Likewise I doubt that access time would be an issue. There are other limits that would come into play long before traversing the file system(s) would be a problem.

Server load could well be an issue. Presumably if a much higher volume product is to be handled there would be a higher volume of FTP traffic, both in placing the product on the server and in retrieving it. And this is an area that I don't see an RDBMS helping all that much. Distributing the existing file system across multiple servers looks to be a better solution as far as scalability is concerned. That way both the data storage and FTP load is distributed. With an RDBMS you'd still have a single choke point, even if there were multiple FTP servers.

So it would seem to me that a file system based FTP repository would scale better than an RDBMS based system.
vlgAuthor Commented:

Thanks for your response - we're primarily concerned about access time.  You mentioned:

There are other limits that would come into play
long before traversing the file system(s) would be a problem.

Can you tell me which limits those are, are why they come into play first?

Thanks again

vlgAuthor Commented:
Thanks much!
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.