mysql as filesystem
Posted on 2009-04-18
Thanks for looking. Ive got a server with over 9 million files to store (called the "archive" from now on). The archive is comprised of roughly half pdfs, and half the .txt files derived from those pdf files. Pdf's are an average of 200kB, txt is generally 2kB. These files are split into sequentially numbered folders with roughly 10k f8iles in each so as not to overload the file system. The archive is expected to double in size.
For both redundancy and sheer size requirements we have gotten an adaptec external minisas raid card and run that to an external enclosure with 4x 1.5TB SATA drives in raid 10. This is working, however it is shockingly slow. For reference when we copied the archive (roughly 900GB total) from one 1TB non-raided drive to another it took about 5 hours. When we copied it to the adaptec raid array it took almost 48 hours. My assumption is that this is because of the massive number of files (ie would be much faster with 1x 1tb file) and how that gets stored on raid??
So, I had an idea. Our mysql DB is running over 100 million records and doing so just fine. So, I considered setting up a new DB on mysql with a table which has an index (we already have a file numbering system in place) and a blob field, then just putting the txt / pdf files in the blob. This would:
1. allow mysql to act as a filesystem
2. provide one, or a small number, of actual files to be placed on the raid array (ie the mysql table files would be stored there instead of the pdf/txt files).
So, here go the questions:
1. would this be faster / slower?
2. should I build one table like: column: primary index, column: pdf(blob), column TXT (blob)
or split it into two tables, one for the pdf, one for the corresponding txt files? (question from a pure performance standpoint)
3. I would be using innodb tables to make use of row level locking and because we wont ever be doing text searches. Is there a way to break the data file into multiple ones without paging the actual table (ie table: pdf 0-1000000, table pdf 1000001-2000000 etc)
4. related to above, should I consider paging the tables to avoid a 1TB table? or is that not a big deal? ie would the db suffer in performance due to the sheer table file size?
5. would a 9 million record DB table with 200k blob perform worse because of the data size than a 9 million db table with a very small amount of data (say int(1))? ie can i expect any kind of performance like I have with my other tables.
Thanks for all your help.