L1/L2 Deduplication

Suggestions around PrimoCache
Post Reply
steveb
Level 4
Level 4
Posts: 21
Joined: Fri Sep 17, 2021 7:58 am

L1/L2 Deduplication

Post by steveb »

Block Level Deduplication could improve the read hit ratio on L1 and L2 cache, depending on the workload. This would be similar to Windows Server Deduplication feature but for L1 and L2 cache. The deduplication processing does not need to occur in real time and could really be handy for prefetched data, etc.
User avatar
Support
Support Team
Support Team
Posts: 3623
Joined: Sun Dec 21, 2008 2:42 am

Re: L1/L2 Deduplication

Post by Support »

Thank you for the suggestion. I agree that this feature should offer more cache space, however, the deduplication process will take more CPU and time, the total performance will be affected.
steveb
Level 4
Level 4
Posts: 21
Joined: Fri Sep 17, 2021 7:58 am

Re: L1/L2 Deduplication

Post by steveb »

Similar to configuration option on Windows Server Deduplication, the deduplication process does not need to run in real time. This can run slowly when the system is idle. Read speed from deduplication data should not introduce additional performance issues.
vlbastos
Level 4
Level 4
Posts: 21
Joined: Sat Jul 16, 2022 10:32 pm

Re: L1/L2 Deduplication

Post by vlbastos »

Deduplication doesn't need to deduplicate in realtime. Following Windows' deduplication principles:

1. Optimization should not get in the way of writes to the disk: Data Deduplication optimizes data by using a post-processing model. All data is written unoptimized to the disk and then optimized later by Data Deduplication.

2. Optimization should not change access semantics: Users and applications that access data on an optimized volume are completely unaware that the files they are accessing have been deduplicated.

You could schedule or deduplicate on idle. Reads and writes should work as they already do now, the only difference is the read mapping of deduplicated files. It's all about the mapping layer between the deduped data store (chunk store) and the read operation.

Besides, there are people like me who don't care about more CPU: I use it in a file server, nothing should need that much CPU. No problem using a little more CPU sometimes.

Edit: some links
https://docs.microsoft.com/en-us/window ... understand
https://en.m.wikipedia.org/wiki/Data_deduplication
https://web.archive.org/web/20191224020 ... ication-v2
User avatar
Support
Support Team
Support Team
Posts: 3623
Joined: Sun Dec 21, 2008 2:42 am

Re: L1/L2 Deduplication

Post by Support »

One of differences to the disk deduplication is that caching contents might vary quickly because of limited cache size. As caching can be used in a wide range of scenarios, the deduplication will increase the system complexity and introduce the instability.
Post Reply