Data loss with defer write

Nick7 · Post by **Nick7** » Tue Sep 22, 2020 11:28 am

I know there's another thread: viewtopic.php?t=4954 but I want to add this here.

I also know it states that using defer write one risks data loss. I agree this is fact when using L1 for defer writes. But using L2 for defer writes should not lead to data loss even if hard crash occurs.

I did simple test with setting L2 to 50GB for write, and quickly filling it with defer write data. During destaging L2 to disk I hit reset button.
When OS was up, checking PrimoCache - deferred block showed 0.
More worryingly, I did not receive any errors or notifications anything bad happened.
Next thing I did was run fsck on FS - it showed FS was fine with no errors.
However, trying to open many files caused error (as one would expect).
But, this could happen with people not knowing! Basically, it's REALLY dangerous to use defer writes this way! You can get corrupt data without knowing it, and not being able to pinpoint which files are actually corrupt!

Now, back to defer write and 'fixing' this.

What needs to be done, or my suggestions:

Enable option to use only L2 for defer writes. Yes, L1 can 'cache' that content from L2, but L2 should be one used for managing and keeping consistency. So, new option is needed - use L2 explicitly for defer writes.

Next is to use mechanism (journaling?) to keep L2 always consistent, so it cannot get corrupted. Similar as other FS's (liek ZFS), or even better - bcache does for writes.

When we know L2 is consistent on crash, next is to know where we were on copying data to spinning rust. When you know data is submitted to HDD and receive confirmation, we can mark (and free) that block from defer write list.

All this should help that unexpected crash/power loss do not cause this bad situation of data loss - and not knowing even where it is.

Jaga · Post by **Jaga** » Tue Sep 22, 2020 4:26 pm

If you hit the power button when Windows is writing to disk actively (no disk caching software) you risk corrupting that data or losing the write entirely. Doesn't matter if you have caching on or off, or are even using it.

Even Windows has a warning on drive policies in the management area (Enable write caching on the device): "Improves system performance by enabling write caching on the device, but a power outage or equipment failure might result in data loss or corruption". And that applies to any drive type.

I don't see how it can/should be any different in Primocache, since new writes have to be held somewhere until they are fully committed (flushed to disk).

Bottom line in my experience working with computers professionally since the 80's: if you expect abrupt and unplanned power loss on an active, running machine.. you can expect to have errors in the data as a result. The responsibility here in keeping data clean and healthy lies with stable hardware, a stable OS, and a UPS (backup power) for hard outages.

I know Primocache could employ some type of write journaling to try and avoid data loss, but that would massively impact performance for L1 caches, and slow down L2's. Perhaps Support wants to chime in on it since they know the overhead impact more than either of us.

Nick7 · Post by **Nick7** » Tue Sep 22, 2020 9:40 pm

@Jaga: it's quite different - any proper FS, including Windows with NTFS will work differently, or should I say 'properly'.
This means that if you flick power button off, there will be loss of data. Depending on type of filesystem you will either have corrupt FS, which you fix with chkdsk/fsck or similar, or you have journaled FS which again will keep consistency, but ofc, with again some loss of data.
In case of manual fixing - you KNOW something happened. In case of FS like ZFS - it will simply discard unfinished writes and keep everything consistent.

One major difference is this - PrimoCache can and will write out of order data to disk from what I've seen. That's what deferred write does. Reduce amount of writes.
Filesystems will write data in order. This is huge difference, and this is imperative for filesystem to keep consistency in case of power loss or similar.

But, in case of PrimoCache, you have:
1) Writing out of order to disk
2) Simply discarding L2 deferred writes

With simply removing data from L2 AFTER it's successfully written to HDD, in case of power failure, you can simply continue where you left off (and not discarding complete L2 delayed writes as it currently does).
Many filesystems, and even some caching software (like in Linux bcache) do the same.
Ofc, this means you cannot use L1 for defer writes, since it's volatile memory, which is lost on power loss, unlike with SSD's.

So yes, it can be done, and it should be done.

PS: I'm working professionally with computers since '90s, and currently work primarily as storage system administrator.

Jaga · Post by **Jaga** » Wed Sep 23, 2020 2:35 am

I'm just not sure what journaling would do to the performance on Primocache. And I don't think Windows itself has a very robust system in place, since I've run into loss of data countless times on ungraceful shutdowns with Windows on pretty much every desktop version over the years.

Primocache uses what's called "write coalescing" to discard unnecessary redundant re-writes. That's where the "Trimmed Blocks" statistic comes from in the UI.

If journaling could work for the L2 (specifically the L2, not the L1), then I'd welcome it. I personally prefer max performance with minimal/no overhead for my L1 implementations, but would welcome the added stability for the L2.

Post by **Support** » Wed Sep 23, 2020 3:02 am

As I said in the referenced topic, the cache index database might not be correctly updated during an ungraceful shutdown, so even using SSD cache we still don't write back L2 cache data at reboot from an ungraceful shutdown. It might cause a disaster if using corrupted indexing data, overwriting good data in other position.
We do consider something like journaling systems, however, performance is one factor we need to deal with. Besides, PrimoCache works in disk blocklevel, not file system level. It doesn't know any file information. It's quite complicated to ensure a correct indexing at unexpected shutdowns.

Nick7 · Post by **Nick7** » Wed Sep 23, 2020 7:23 am

@Jaga: I agree completely with your last post. This is my idea (or should I say request) - to have journaling for L2 when using deferred writes. For L1, ofc, makes no sense - since in case of reboot/power loss you lose L1 content anyway.

@Support: I understand this would require some changes. However, this would be for using L2 for deferred writes. Performance penalty is still much less than not using L2 for such scenario at all. There's a lot of people that do require consistency, while trying to get best possible performance.
And yes, PrimoCache works at block level, but that does not change fact that it can be, and should be, done. Having L2 consistent, and in case of failure to know where you left off is really important.

Post by **Support** » Thu Sep 24, 2020 3:03 am

@Nick7, Yes, I agree with you.

Romex Software Forum

Data loss with defer write

Data loss with defer write

Re: Data loss with defer write

Re: Data loss with defer write

Re: Data loss with defer write

Re: Data loss with defer write

Re: Data loss with defer write

Re: Data loss with defer write