Solved

How scalable is this?

Posted on 2002-04-10
4
185 Views
Last Modified: 2010-04-21
Hello all
We have a box from which we ftp data.
The data is stored in the filesystem, in a hierarchy of directories and subdirectories, for example:
productx--
         |
         year
            |
            month
                 |
                 day
                   |
                    hour
                       |
                       minute
 
so to look up a file, you have to find the right product, and then navigate down to the minute the product was produced, and then parse to find the right file.
There is rarely more than a half-dozen of the same product that were produced at the same minute, and commonly only a few.

Some people want to use this box for a much higher volume
product, and there are concerns that such a lookup system won't scale, so I thought I'd ask the experts for their opinions.
The people who work with the box regularly say it can scale, but those who don't think that not using a modern RDBMS is ridiculous.
Thanks much for your input, and I'll also give at least the few best answers some points, too, maybe 50 each.
Vic
0
Comment
Question by:vlg
  • 2
  • 2
4 Comments
 
LVL 40

Expert Comment

by:jlevie
ID: 6932721
Where is the concern about scaleability? In terms of storage, access time, FTP server load?

It would seem to me that storage would not be an issue with this sort layout unless there could be more files located in a particular minute than were inodes on the disk that contained them. The layout of the storage lends itself to multiple mount points so even a humongous amount of 'product' could be easily handled.

Likewise I doubt that access time would be an issue. There are other limits that would come into play long before traversing the file system(s) would be a problem.

Server load could well be an issue. Presumably if a much higher volume product is to be handled there would be a higher volume of FTP traffic, both in placing the product on the server and in retrieving it. And this is an area that I don't see an RDBMS helping all that much. Distributing the existing file system across multiple servers looks to be a better solution as far as scalability is concerned. That way both the data storage and FTP load is distributed. With an RDBMS you'd still have a single choke point, even if there were multiple FTP servers.

So it would seem to me that a file system based FTP repository would scale better than an RDBMS based system.
0
 

Author Comment

by:vlg
ID: 6934423
jlevie

Thanks for your response - we're primarily concerned about access time.  You mentioned:

There are other limits that would come into play
long before traversing the file system(s) would be a problem.

Can you tell me which limits those are, are why they come into play first?

Thanks again

Edmund
0
 
LVL 40

Accepted Solution

by:
jlevie earned 100 total points
ID: 6934682
Well, since the access to the data is by FTP, the network and FTP service will limit the process first. The actual data transfer via FTP is the most efficient means of moving data between systems, but the overhead associated with getting from the initial FTP connect to the get/put is significant. Also you have to consider what limits the OS and hardware have, like max number of processes, max open files, installed memory, etc. Obviously, if there are lots and lots of FTP sessions active you could easily get into a situation where the server would start to swap which would significantly impact access time. And that's where a number of servers, each with a portion of the storage, has an advantage.
0
 

Author Comment

by:vlg
ID: 6934972
Thanks much!
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When you do backups in the Solaris Operating System, the file system must be inactive. Otherwise, the output may be inconsistent. A file system is inactive when it's unmounted or it's write-locked by the operating system. Although the fssnap utility…
Installing FreeBSD… FreeBSD is a darling of an operating system. The stability and usability make it a clear choice for servers and desktops (for the cunning). Savvy?  The Ports collection makes available every popular FOSS application and packag…
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
In a previous video, we went over how to export a DynamoDB table into Amazon S3.  In this video, we show how to load the export from S3 into a DynamoDB table.

679 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question