Page 1 of 1

L1/L2 Deduplication

Posted: Fri Sep 17, 2021 9:57 am
by steveb
Block Level Deduplication could improve the read hit ratio on L1 and L2 cache, depending on the workload. This would be similar to Windows Server Deduplication feature but for L1 and L2 cache. The deduplication processing does not need to occur in real time and could really be handy for prefetched data, etc.

Re: L1/L2 Deduplication

Posted: Sat Sep 18, 2021 4:21 am
by Support
Thank you for the suggestion. I agree that this feature should offer more cache space, however, the deduplication process will take more CPU and time, the total performance will be affected.

Re: L1/L2 Deduplication

Posted: Sat Sep 18, 2021 9:52 am
by steveb
Similar to configuration option on Windows Server Deduplication, the deduplication process does not need to run in real time. This can run slowly when the system is idle. Read speed from deduplication data should not introduce additional performance issues.

Re: L1/L2 Deduplication

Posted: Mon Jul 18, 2022 3:15 pm
by vlbastos
Deduplication doesn't need to deduplicate in realtime. Following Windows' deduplication principles:

1. Optimization should not get in the way of writes to the disk: Data Deduplication optimizes data by using a post-processing model. All data is written unoptimized to the disk and then optimized later by Data Deduplication.

2. Optimization should not change access semantics: Users and applications that access data on an optimized volume are completely unaware that the files they are accessing have been deduplicated.

You could schedule or deduplicate on idle. Reads and writes should work as they already do now, the only difference is the read mapping of deduplicated files. It's all about the mapping layer between the deduped data store (chunk store) and the read operation.

Besides, there are people like me who don't care about more CPU: I use it in a file server, nothing should need that much CPU. No problem using a little more CPU sometimes.

Edit: some links
https://docs.microsoft.com/en-us/window ... understand
https://en.m.wikipedia.org/wiki/Data_deduplication
https://web.archive.org/web/20191224020 ... ication-v2

Re: L1/L2 Deduplication

Posted: Wed Jul 20, 2022 3:52 pm
by Support
One of differences to the disk deduplication is that caching contents might vary quickly because of limited cache size. As caching can be used in a wide range of scenarios, the deduplication will increase the system complexity and introduce the instability.