In our database we have a table that stores files as BLOBS. Due to the sorts of files people are adding it's bloating our database very quickly. I'm in the process of implementing compression on the files which is gaining us around 30GB already, but it's come to my attention that because of the way the code works we quite often end up with duplicate files in different BLOB records.
We currently store the File Name in the DB record, and I can easily extrapolate the size of the BLOB using LENGTH(). I can test this for equality, but is there an alternative method I can use to establish if files are identical?
Progress 11.2.1 on Windows
In our database we have a table that stores files as BLOBS. Due to the sorts of files people are adding it's bloating our database very quickly. I'm in the process of implementing compression on the files which is gaining us around 30GB already, but it's come to my attention that because of the way the code works we quite often end up with duplicate files in different BLOB records.
We currently store the File Name in the DB record, and I can easily extrapolate the size of the BLOB using LENGTH(). I can test this for equality, but is there an alternative method I can use to establish if files are identical?
Progress 11.2.1 on Windows
Flag this post as spam/abuse.
Perhaps you could add an indexed checksum field, and upon storage of a blob, calculate the checksum, and compare it to the other records to detect a duplicate.
Thanks [mention:9e4ee96fac634b8f91b580e1fb4f7e71:e9ed411860ed4f2ba0265705b8793d05], that's the sort of thing I'm after. I'm inserting a CrossRef table meaning we should be able to get rid of the duplicates.