Best DB to storage files

PLCITS used Ask the Experts™
Hi We are using Visual Studio and we want to start storage PDF or some other files in DB but we would like to know which is the best DB to do it?

we have SQL, Mysql, and MongoDB    if you know other, please advice
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
John TsioumprisSoftware & Systems Engineer

Based on a similar search sometime ago the best should be Firebird
For start a 460GB Single database seems quite big and in paper it can go much larger
The database file size limits are:

32 TB for Firebird 2.x
64 TB for Firebird 3.x
I would strongly recommend you don't do this at all. Almost everyone who does this ends up going away from that design later on, which usually means a painful transition (because they start seeing the problems AFTER there is a lot of real-world data).

Some common issues:
1. It's easy to build some bad queries early on that perform fine until there's lots of data.

2. Compared to normal queries, the data file queries use a relatively huge amount of bandwidth and memory, and keep the client connected for longer, which reduces your ability to scale to meet higher usage. Ideally, clients should be connecting, getting small amounts of precise data, and disconnecting, all in milliseconds. In database land, every kilobyte of data is precious.

3. It ties your data storage expansion to your database server. If you need to move to a bigger physical drive because DB-stored files are taking up too much storage then your whole DB might have to go offline to do it.

4. For hosting services, DB server instances are often a LOT pricier than storage instances. So upgrading the storage on the DB server might cost you many times more than upgrading a storage server instance. Usually you want premium disk IO for databases (extreme speed and reliability), while file storage servers are often fine to use cheaper, slower, higher capacity drives. If it takes a long time to stream data from a storage server, it's usually no big deal.

5. It makes it harder to back up your data. While it's trivial for most backup solutions to incrementally back up files and provide you with superior restore capabilities (a full snapshot at any time), it's much harder to do that with files in the database. You can have replication syncing to a second server, but that also includes replication of dangerous queries (e.g. delete file record from the master will delete it on the slave). Restoration of DB data is also much slower and more difficult for numerous reasons (compared to the filesystem).

6. It makes backing up the other, non-file data take longer, since most backup solutions take entire, full snapshots. So if you do a daily backup but you are only changing a few megabytes of data here and there each day, your backup would still likely include all the files (even if none of them changed that day). So that means more data storage needed, and longer-running backup threads taking up large amounts of your database resources every time you back up.

7. You mostly lose the advantages of your disk's caching mechanisms, which puts more load on the disk and database IO. It may also negatively impact other unrelated queries by taking up valuable space in the DB's query cache.

8. It puts more strain overall on the DB server, which could easily be separated to a different server for better load distribution.

8. It puts more strain on the client since the client has to keep more data within buffered memory (compared to being able to stream/ read it in small chunks to keep memory usage low).

That said, most people just store file metadata in the database and store the actual data on a separate, cheaper server that can use incremental, real-time backup services (e.g. crashplan and similar services).

It keeps things low-cost and gives you great flexibility and better resource utilization and better scaling.
Most Valuable Expert 2015
Distinguished Expert 2018
we would like to know which is the best DB to do it?

A file server. It is built and optimised for exactly this, and speed, cost, and scalability cannot be beaten by anything else - built on top of it as it will be.
Ensure you’re charging the right price for your IT

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Just to add to what I was saying, if you DO store files into a database despite the recommendations (I suspect Gustav and I will not be the only ones to chime in with the same recommendation), then it likely doesn't matter too much. Different databases won't impact storage of raw data very much. Storing a PDF in MySQL will be pretty much the same as storing it in MS SQL or PostgreSQL or MongoDB or anything else. None of them have any real special handling when it comes to that.

You should really be picking the best database for the non-file data portion of your application instead. If you're not a database expert and just want something that will be easy and work with a basic LAMP-type stack, then MySQL / Maria will be the way to go and is often the starting place for a lot of people. If you've got the money and infrastructure for it, SQL Server has some really great performance and tools. MongoDB (or any of the other non-relational DBs) is a different beast altogether, so you'd have to determine whether it meets your data needs/expectations.
ste5anSenior Developer

Well, if you want to do this, then I would recommend that you look into SQL Server's FILESTREAM data type. See Binary Large Object (Blob) Data for the supported technologies.
Jim RiddlesPrepress/OMS Specialist

Here is a link to an excellent read on this very topic.  TL;DR  The file system is still the best overall choice.
i'd also advocate against doing this at all.

but if you really want to :
- mongodb is not good at storing unindexes large strings or blobs.
- sql server will do the trick at a huge disk and ram cost. and licence cost.
- mysql will behave decently for files in the 8k - 32k range and can be tweaked to work with bigger blobs at the cost of slowing other things down. forget tweaking if you have an existing db. but beware : many reads will degrade the server performance for other queries dramatically. that's a decent choice though if you want to store a few tens of thousands of invoices for example. do not even think of it if you store books or expect dozens of queries per second.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial