mysql as filesystem

Thanks for looking. Ive got a server with over 9 million files to store (called the "archive" from now on). The archive is comprised of roughly half pdfs, and half the .txt files derived from those pdf files. Pdf's are an average of 200kB, txt is generally 2kB. These files are split into sequentially numbered folders with roughly 10k f8iles in each so as not to overload the file system. The archive is expected to double in size.

For both redundancy and sheer size requirements we have gotten an adaptec external minisas raid card and run that to an external enclosure with 4x 1.5TB SATA drives in raid 10. This is working, however it is shockingly slow. For reference when we copied the archive (roughly 900GB total) from one 1TB non-raided drive to another it took about 5 hours. When we copied it to the adaptec raid array it took almost 48 hours. My assumption is that this is because of the massive number of files (ie would be much faster with 1x 1tb file) and how that gets stored on raid??

So, I had an idea. Our mysql DB is running over 100 million records and doing so just fine. So, I considered setting up a new DB on mysql with a table which has an index (we already have a file numbering system in place) and a blob field, then just putting the txt / pdf files in the blob. This would:
1. allow mysql to act as a filesystem
2. provide one, or a small number, of actual files to be placed on the raid array (ie the mysql table files would be stored there instead of the pdf/txt files).

So, here go the questions:

1. would this be faster / slower?
2. should I build one table like:   column: primary index, column: pdf(blob), column TXT (blob)
or split it into two tables, one for the pdf, one for the corresponding txt files? (question from a pure performance standpoint)
3. I would be using innodb tables to make use of row level locking and because we wont ever be doing text searches. Is there a way to break the data file into multiple ones without paging the actual table (ie table: pdf 0-1000000, table pdf 1000001-2000000 etc)
4. related to above, should I consider paging the tables to avoid a 1TB table? or is that not a big deal? ie would the db suffer in performance due to the sheer table file size?
5. would a 9 million record DB table with 200k blob perform worse because of the data size than a 9 million db table with a very small amount of data (say int(1))? ie  can i expect any kind of performance like I have with my other tables.

Thanks for all your help.
Who is Participating?
Hugh FraserConnect With a Mentor ConsultantCommented:
The stripe size just maps blocks in a virtual device created by raid to a physical device. The size of the files is more of an issue for the ext3 file system and how big the chunks of free space are that are allocated to files. Allocating more blocks when a file needs to be extended can result in lots of wasted space for small files.

Stripe size affects how much of this virtual device gets read/written to the disks at one time. Larger stripe sizes (like 1/4 to a full cylinder) increase the amount of the virtual filesystem that is read or written at one time. Therefore, I don't think that you'll buy much by exchanging multiple small files for a single database file.

The current wisdom I've seen indicates that large stripe sizes result in better performance, all else being equal.

However, writes to a striped volume are always slower. That's because the error correction information can be scattered across multiple platters, all of which need to be read. It's the read performance that striping enhances. So your backup of the filesystem is probably highlighting the worst characteristic of striping, while reading the files from a striped volume and writing to a non-striped volume would probably show striping at its best.

So ultimately, you need to decide what behaviour you want to optimize. if your goal is to optimize write performance, while still having the reliablity, simple mirrored raid is probably your best choice. But striping should give you the best read performance.

If you choose the latter, look at turning off the "last access" feature.  Also, if you have reliable power, or a UPS that can allow the OS to do an orderly shutdown, look at going to writeback journalling to reduce writes to the disk. If data integrity is of prime importance, you might consider storing the journal on another, perhaps non-striped, spindle.
Hugh FraserConsultantCommented:
What is the underlying filesystem you're using? If it's ext3, before making the leap to a database for your files, have you looked at tuning the ext3 filesystems for your non-typical application? The striped raid drives are going to squeeze about as much performance out of the physical hardware as you can, but there are many parameters that can be adjusted to increase the filesystem's performance (writeback frequency, journalling options, directory indexing, etc..). I don't have experience with as large a system as yours, but the out-of-the-box ext3 is configured more for data integrity than for performance.
thummperAuthor Commented:
yes, the current fs is ext3, however it is also ext on both non-raided sata drives which were used and they got MUCH better performance. Im wondering if, because the files are so small, theres a performance hit with striping, ie files smaller than 1 stripe. To illustrate what I mean by all drives being ext3 take the following scenario:

Drive A - 1TB ext3 - sda1
Drive B - 1TB ext3 - sdb1
Drive C - 4x 1.5TB drives raid10 (hw raid from pciex8 card) - sdc1

The total archive (being roughly 9million small files) originated on drive A, and was copied to drive B in 5 hours. From there it was then copied from drive B to drive C which took 48 hours.  Given that A and B are the same, the read part of the two copies are the same, so the only difference, and hence the short pole here, has to be the difference in writing to a setup like drive B vs writing to a setup like drice C (ie a raid issue in the current conditions) My assumption is that its raid in the situation of a bazillion small files.

I do think you have a great line of thinking here, in either tuning ext3 or the raid setup, but not sure how to do either or the optimal specific settings for either. Any insight?

The possible advantage of using mysql as the virtual filesystem, in my mind at least, is that you are now striping 1 (or a handful) of very large files, and backup becomes rather easy as well, as you just do a mysql dump to the backup media (also subject to easy gz compression without a massive tar operation first). To elaborate on the backup model, we have looked into tape backups and were disappointed that they tend to be a fe hundred GB rather than the ridiculously high capacity I assumed. So, well likely be backing up to additional sata drives, which definitely cant take 48 hours to do each time we do it. My assumption is that by doing a gz mysql dump we can fit a 2TB db table easily into a standard 1 or 1.5TB drive.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.