I recently moved my files to a new zfs-pool and used that chance to properly configure my datasets.
This led me to discovering zfs-deduplication.
As most of my storage is used by my jellyfin library (~7-8Tb), which is mostly uncompressed bluray rips I thought I might be able to save some storage using deduplication in addition to compression.
Has anyone here used that for similar files before? What was your experience with it?
I am not too worried about performance. The dataset in question is rarely changed. Basically only when I add more media every couple of months. I also have overshot my cpu-target when originally configuring my server so there is a lot of headroom there. I have 32Gb of ram which is not really fully utilized either (but I also would not mind upgrading to 64 too much).
My main concern is that I am unsure it is useful. I suspect just because of the amount of data and similarity in type there would statistically be a lot of block-level duplication but I could not find any real world data or experiences on that.
You should maybe read about the use cases for deduplication before using it. Here’s one recent article:
https://despairlabs.com/blog/posts/2024-10-27-openzfs-dedup-is-good-dont-use-it/
If you mostly store legit Blu-ray rips, the answer is probably no, you should not use zfs deduplication.
I was also going to link this. I started using zfs 10-ish years ago and used dedup when it came out, and it was really not worth it except for archiving a bunch of stuff I knew had gigs of duplicate data. Performance was so poor.