Save files in database: good or bad?

Posted by onnodehaan on 21-Dec-2018 13:04

Hi guys

I'm currently looking at some code that stores files in a CLOB/BLOB in our OpenEdge database, or save them on the filesystem. Saving them in the database seems wrong to me.

The current decision to store files in the database, are driven by the fact that we can't easily access system when we deploy to an cloud environment. But I think that decision was perhaps not based on good arguments, because OpenEdge can of course always some part of the OS to read Pl-files, R'codes, etc. So configuring a folder to store files, can always be done.

Am I paranoid about saving files in the database? Or should I go ahead and refactor this part of our application? Love you hear you pro's and con's, since I'm a bit stuck in between the two options :-)

Posted by David Abdala on 21-Dec-2018 13:50

This is a subject that can't be answered: there are good reasons for both approaches, and both are good, or bad, based on each particular case.

What I always recommend is having the option: the application should be able to do both, and be able to change the configuration at any point in time, to chage from DB to filesystem or viceversa.

If having the option is "too much", then you have to make the choice based in your particular use case (is not the same storing employee face images, that storing PDF invoices or students photo-books).

Pros:

Filesystem:

-  not your problem (handled by OS)

- does not affects DB times (running nor backup/restore)

- easy to hack (replacing files, deleting, adding, etc)

DB:

- goes where the db goes (easy migration)

- integrity is "granted" (better at least)

- easy handling (find the record, have the file)

Cons:

Filesystem:

- integrity is not so easy to grant

- backup/restore "by hand"

- you have to find the file, after the record in the DB

- filesystem permissions can bite you.. more than once

DB:

- grows and grows a lot faster

- "slow" backup/restore

- some people says performance is affected, but I have no evidence of such thing, and should not happen

Conclusion: for me is exactly the same, "in abstract". When you analyze each particular case, then you can make a better choice.

In most situations I give the option (a configuration option having a filesystem path, or nothing to use the DB), because I can't know in advance the type of use the application will have. In case you can do both, at application level I mean, you need to "fall back" when you try to retrieve a file (if not in the DB, look for it in the filesystem, and viceversa).

I've done a lot of filehandling, in most cases I store them in the filesystem, because I can afford to loose them, but when the files are as important as the rest of the data, they go to the database (I loose everything or nothing).

I know my answer won't help you, but you should have a less biased opinion about it (from your post you are biased towards the filesystem) and make a more "agnostic" analysis..

All Replies

Posted by David Abdala on 21-Dec-2018 13:50

This is a subject that can't be answered: there are good reasons for both approaches, and both are good, or bad, based on each particular case.

What I always recommend is having the option: the application should be able to do both, and be able to change the configuration at any point in time, to chage from DB to filesystem or viceversa.

If having the option is "too much", then you have to make the choice based in your particular use case (is not the same storing employee face images, that storing PDF invoices or students photo-books).

Pros:

Filesystem:

-  not your problem (handled by OS)

- does not affects DB times (running nor backup/restore)

- easy to hack (replacing files, deleting, adding, etc)

DB:

- goes where the db goes (easy migration)

- integrity is "granted" (better at least)

- easy handling (find the record, have the file)

Cons:

Filesystem:

- integrity is not so easy to grant

- backup/restore "by hand"

- you have to find the file, after the record in the DB

- filesystem permissions can bite you.. more than once

DB:

- grows and grows a lot faster

- "slow" backup/restore

- some people says performance is affected, but I have no evidence of such thing, and should not happen

Conclusion: for me is exactly the same, "in abstract". When you analyze each particular case, then you can make a better choice.

In most situations I give the option (a configuration option having a filesystem path, or nothing to use the DB), because I can't know in advance the type of use the application will have. In case you can do both, at application level I mean, you need to "fall back" when you try to retrieve a file (if not in the DB, look for it in the filesystem, and viceversa).

I've done a lot of filehandling, in most cases I store them in the filesystem, because I can afford to loose them, but when the files are as important as the rest of the data, they go to the database (I loose everything or nothing).

I know my answer won't help you, but you should have a less biased opinion about it (from your post you are biased towards the filesystem) and make a more "agnostic" analysis..

Posted by James Palmer on 21-Dec-2018 13:55

Can't really add a huge amount to David's post above, but from experience, the impact on DB maintenance tasks is a big one to consider. I worked with a 500GB database, where 300GB was BLOBs of invoices (PDFs mostly). Backups, restores, disaster recovery, dump and load activities, all took a huge amount of time more than they should have done.

If you do go with Blobs, do make sure you keep them in a separate storage area.

Posted by ChUIMonster on 21-Dec-2018 14:33

I think that if you are going to be doing this sort of thing that putting the LOBs in a separate storage area is only a partial solution -- for all of the reasons that James mentioned you should go one step more and put them in  a *database* of their own (and that db should have dedicated storage areas for LOBs).

Also -- there are a number of stand-alone document management databases offered as "bolt-ons".  I would explore one of those options before deciding to roll my own.

Posted by Jeff Ledbetter on 21-Dec-2018 14:47

We save binary files in an OpenEdge database, but we do not use BLOBs. We store as RAW chunks (1 record per chunk) in a table that it is in a dedicate area. We were doing it this before BLOBs came about and it's proven to be simple and reliable.

We also save on disk as well via a configured "depot" that our application reads/writes to/from. We maintain a "pointer" record in the database so it can find its counterpart on disk. It's faster than above but there is the extra overhead of backing up but once it's in place it's no big deal.

If speed is not an issue, I would store as RAW chunks.

Posted by Peter Judge on 21-Dec-2018 15:22

If on-disk encryption of the files is an issue, then storing blobs in the DB and using TDE may be an appropriate solution, since you don’t need to worry about ensuring the encryption of the files and the DB.
 

Posted by gus bjorklund on 21-Dec-2018 16:04

in addition to the current BLOB and CLOB large object types, it might be useful to have another type: a "FLOB" wherein the value of a FLOB column is a pointer to a file in the database server's local filesystem. fetching a FLOB value would cause the contents of the file to be returned to the client in the same way as a BLOB value would be.

since the FLOB values are not stored in the database, transaction semantics of FLOBs would have to bit a bit "relaxed" compared to the other types. also, if file no longer exists, instead of an error a unknown value should probably be returned.

This thread is closed