dustyny wrote:Sigh.. First off nothing about this project has anything to do with active memory dedupe. Opendedup is a file-system (SDFS) and that 800MB's write only applies to their project. Since SDFS isn't in the mainline Linux Kernel they have to use FUSE which runs in the user space and has tons of performance penalties.
Here is one more link
http://blogs.technet.com/b/filecab/arch ... -2012.aspx
Quote from it.
When copying a single large file, we see end-to-end copy times that can be 1.5 times what it takes on a non-deduplicated volume.
When copying multiple large files at the same time we have seen gains due to caching that can cause the copy time to be faster by up to 30%.
Under our file-server load simulator (the File Server Capacity Tool) set to simulate 5000 users simultaneously accessing the system we only see about a 10% reduction in the number of users that can be supported over SMB 3.0.
Data can be optimized at 20-35 MB/Sec within a single job, which comes out to about 100GB/hour for a single 2TB volume using a single CPU core and 1GB of free RAM. Multiple volumes can be processed in parallel if additional CPU, memory and disk resources are available.
Please provide other speed tests if you have some.
Have no idea how it works there, VMWare Server ans WS dedupes RAM pretty bad. 16 clones of same VM with 512mb of ram, eats total about 8Gb. Maybe ESX does better, but it is a Linux, and right now we are not talking about Linux solutions. You can tweak it much more then Windows. KVM in kernel virtualization mode i suppose do not dedupe but reuse kernel and libs directly, but I can be wrong. And yes it still can make cache useless if you need speed higher then ssd raid 0. Under VM i was not able to reach more then 750 MB/s for disk operations, even on ram drives.
As opposed to what? You imply you have experience, then give examples. What other levels have you used dedupe on and why is it better to do it at the file system level?
Opposed to your idea: dedupe in block level cache. Better to leave it to file system (NTFS in my case), Microsoft much more popular and has much more resources to provide good support and build nice tools. And if block level cache works between dedups and hardware IOps, then cache will contain only uniq data. I'm not sure that it is like that, but it looks logical for me. And in opposit situation we can perform two dedupes, if NTFS does it's own, and then FC does it's own, it looks like overhead and bad design to me.
P.S. I have used NTFS and off-line dedupe for VM's. It works fine for me. I can't provide examples of your idea. But it looks illogical to me.
P.P.S same if you would not start to use facts and proof links, I would loose interest talking to you. So please keep it in mind, if I would ignore your messages.