Solved

How scalable is this?

Posted on 2002-04-10
4
178 Views
Last Modified: 2010-04-21
Hello all
We have a box from which we ftp data.
The data is stored in the filesystem, in a hierarchy of directories and subdirectories, for example:
productx--
         |
         year
            |
            month
                 |
                 day
                   |
                    hour
                       |
                       minute
 
so to look up a file, you have to find the right product, and then navigate down to the minute the product was produced, and then parse to find the right file.
There is rarely more than a half-dozen of the same product that were produced at the same minute, and commonly only a few.

Some people want to use this box for a much higher volume
product, and there are concerns that such a lookup system won't scale, so I thought I'd ask the experts for their opinions.
The people who work with the box regularly say it can scale, but those who don't think that not using a modern RDBMS is ridiculous.
Thanks much for your input, and I'll also give at least the few best answers some points, too, maybe 50 each.
Vic
0
Comment
Question by:vlg
  • 2
  • 2
4 Comments
 
LVL 40

Expert Comment

by:jlevie
ID: 6932721
Where is the concern about scaleability? In terms of storage, access time, FTP server load?

It would seem to me that storage would not be an issue with this sort layout unless there could be more files located in a particular minute than were inodes on the disk that contained them. The layout of the storage lends itself to multiple mount points so even a humongous amount of 'product' could be easily handled.

Likewise I doubt that access time would be an issue. There are other limits that would come into play long before traversing the file system(s) would be a problem.

Server load could well be an issue. Presumably if a much higher volume product is to be handled there would be a higher volume of FTP traffic, both in placing the product on the server and in retrieving it. And this is an area that I don't see an RDBMS helping all that much. Distributing the existing file system across multiple servers looks to be a better solution as far as scalability is concerned. That way both the data storage and FTP load is distributed. With an RDBMS you'd still have a single choke point, even if there were multiple FTP servers.

So it would seem to me that a file system based FTP repository would scale better than an RDBMS based system.
0
 

Author Comment

by:vlg
ID: 6934423
jlevie

Thanks for your response - we're primarily concerned about access time.  You mentioned:

There are other limits that would come into play
long before traversing the file system(s) would be a problem.

Can you tell me which limits those are, are why they come into play first?

Thanks again

Edmund
0
 
LVL 40

Accepted Solution

by:
jlevie earned 100 total points
ID: 6934682
Well, since the access to the data is by FTP, the network and FTP service will limit the process first. The actual data transfer via FTP is the most efficient means of moving data between systems, but the overhead associated with getting from the initial FTP connect to the get/put is significant. Also you have to consider what limits the OS and hardware have, like max number of processes, max open files, installed memory, etc. Obviously, if there are lots and lots of FTP sessions active you could easily get into a situation where the server would start to swap which would significantly impact access time. And that's where a number of servers, each with a portion of the storage, has an advantage.
0
 

Author Comment

by:vlg
ID: 6934972
Thanks much!
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

I have been running these systems for a few years now and I am just very happy with them.   I just wanted to share the manual that I have created for upgrades and other things.  Oooh yes! FreeBSD makes me happy (as a server), no maintenance and I al…
FreeBSD on EC2 FreeBSD (https://www.freebsd.org) is a robust Unix-like operating system that has been around for many years. FreeBSD is available on Amazon EC2 through Amazon Machine Images (AMIs) provided by FreeBSD developer and security office…
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
In a previous video, we went over how to export a DynamoDB table into Amazon S3.  In this video, we show how to load the export from S3 into a DynamoDB table.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now