I also know it states that using defer write one risks data loss. I agree this is fact when using L1 for defer writes. But using L2 for defer writes should not lead to data loss even if hard crash occurs.
I did simple test with setting L2 to 50GB for write, and quickly filling it with defer write data. During destaging L2 to disk I hit reset button.
When OS was up, checking PrimoCache - deferred block showed 0.
More worryingly, I did not receive any errors or notifications anything bad happened.
Next thing I did was run fsck on FS - it showed FS was fine with no errors.
However, trying to open many files caused error (as one would expect).
But, this could happen with people not knowing! Basically, it's REALLY dangerous to use defer writes this way! You can get corrupt data without knowing it, and not being able to pinpoint which files are actually corrupt!
Now, back to defer write and 'fixing' this.
What needs to be done, or my suggestions:
- Enable option to use only L2 for defer writes. Yes, L1 can 'cache' that content from L2, but L2 should be one used for managing and keeping consistency. So, new option is needed - use L2 explicitly for defer writes.
- Next is to use mechanism (journaling?) to keep L2 always consistent, so it cannot get corrupted. Similar as other FS's (liek ZFS), or even better - bcache does for writes.
- When we know L2 is consistent on crash, next is to know where we were on copying data to spinning rust. When you know data is submitted to HDD and receive confirmation, we can mark (and free) that block from defer write list.