About Defer-Write and the risk of data loss on power outage

FAQ, getting help, user experience about PrimoCache
van
Level 1
Level 1
Posts: 2
Joined: Tue Jan 26, 2021 1:18 pm

About Defer-Write and the risk of data loss on power outage

Post by van »

Hello, I am not clear about the risk from Defer-Write and the risk of data loss on power outage... I understand the why and when, I'm just not clear if the risk is only for the cached disk? Are all the connected disks in any way affected, even if they are not cached?

I mean it's logical that only the cached disk are affected, but I'd be thankful if it could be clarified without doubt.


Jaga wrote: Sun Jan 24, 2021 3:43 am Additionally, you might want a slightly longer Defer-Write time if you have a UPS on the computer (Uninterruptible Power Supply). If you don't have one at all, know that you risk data loss by having Defer-Write turned on.
Hey mate sorry to quote you here, I just didn't want to high-jack that other thread.
Would you care to comment on my dillema?

Support wrote: Sun Mar 25, 2018 12:52 pm Regarding PrimoCache Defer-Write, please note that so far it still has the risk of data loss on power outage, system crash/hang, even you only use the SSD caching.
Hello, can you please clarify if the stated risk is only for the cached disk, or are all the connected disks affected - even if they are not cached with Primocache?
janusz521
Level 5
Level 5
Posts: 51
Joined: Wed Aug 26, 2020 6:11 pm

Re: About Defer-Write and the risk of data loss on power outage

Post by janusz521 »

Only the cached disk. Also, the crash is usually not fatal. You lose the work relative to the cache delay. I witnessed it several times and only once I saw perturbations like a message about corrupted Chrome settings after the reboot. But some bad scenarios are possible if i.e. a part of a file was saved on disk overwriting an existing file and the other part was still waiting in the cache. Most likely such a file will not be readable after reboot. But this is still quite a rare and unlikely case.
van
Level 1
Level 1
Posts: 2
Joined: Tue Jan 26, 2021 1:18 pm

Re: About Defer-Write and the risk of data loss on power outage

Post by van »

Thanks mate... So, only the cached disks with defer-write are in risk.
I only want to try L1 cache for the SSD and don't worry much about that particular disk's content, it's all backed up and I don't keep any important data there. Content is OS/apps and games, and games are also mirrored on mechanical drive. It's the mechanical drives content I worry about, and I don't plan to cache them.
Nick7
Level 5
Level 5
Posts: 46
Joined: Sun Jun 25, 2017 7:50 am

Re: About Defer-Write and the risk of data loss on power outage

Post by Nick7 »

Greatest problem with defer-write is out of order writes. This is what causes greatest issue.
Out of order writes are used to maximize and reduce writes to slow HDD.

Maybe option to disable out of order writes could be added?
User avatar
Jaga
Contributor
Contributor
Posts: 692
Joined: Sat Jan 25, 2014 1:11 am

Re: About Defer-Write and the risk of data loss on power outage

Post by Jaga »

Nick7 wrote: Fri Jan 29, 2021 8:58 am Greatest problem with defer-write is out of order writes. This is what causes greatest issue.
Out of order writes are used to maximize and reduce writes to slow HDD.

Maybe option to disable out of order writes could be added?
Are you referring to "write coalescing"? If so, it is a highly useful feature that I have yet to experience problems with.
User avatar
Support
Support Team
Support Team
Posts: 3623
Joined: Sun Dec 21, 2008 2:42 am

Re: About Defer-Write and the risk of data loss on power outage

Post by Support »

Nick7 wrote: Fri Jan 29, 2021 8:58 am Greatest problem with defer-write is out of order writes. This is what causes greatest issue.
Out of order writes are used to maximize and reduce writes to slow HDD.
Sorry, I don't understand very well. Could you give me more details? Thanks.
Nick7
Level 5
Level 5
Posts: 46
Joined: Sun Jun 25, 2017 7:50 am

Re: About Defer-Write and the risk of data loss on power outage

Post by Nick7 »

Jaga wrote: Fri Jan 29, 2021 10:04 pmAre you referring to "write coalescing"? If so, it is a highly useful feature that I have yet to experience problems with.
Yes.
In normal use it's quite useful feature, I agree.
However, problem is in case of sudden power loss. If you write data in order, you can easily recover and discard last blocks if needed. If you write out of order to disk, you have no consistency and can easily end up with lost and corrupt data even without knowing it!
I actually tested this - start some data copy/edit with defer writes and forcefully reboot machine (unplug power, hard reset or similar).
Even if file check on FS completes fine, data may (in my case did!!) get corrupted.
Support wrote: Sun Jan 31, 2021 1:53 am Sorry, I don't understand very well. Could you give me more details? Thanks.
As I explained to Jaga - problem is when data is written out of order to disk. In case of crash, deferred writes are simply dropped (makes sense in case of L1 since it's in memory, in case of L2 - not so much).
Only way that this could be omitted is using L2 cache for defer writes and using journaling for it.
In case of hard crash, software (PrimoCache) would run through complete L2 transaction log, and there would be no errors (data loss) on disk.
User avatar
Support
Support Team
Support Team
Posts: 3623
Joined: Sun Dec 21, 2008 2:42 am

Re: About Defer-Write and the risk of data loss on power outage

Post by Support »

Now I understand, thank you, Nick7.
InquiringMind
Level SS
Level SS
Posts: 477
Joined: Wed Oct 06, 2010 11:10 pm

Re: About Defer-Write and the risk of data loss on power outage

Post by InquiringMind »

Nick7 wrote: Fri Jan 29, 2021 8:58 amGreatest problem with defer-write is out of order writes. This is what causes greatest issue...
Haven't out-of-order writes been a feature of NTFS since its inception, plus disk interfaces like SCSI? (with features like elevator seeking and write queuing). PrimoCache may add an extra level with Defer-Writes, but you couldn't get rid of this completely.

The symptoms you describe seem more likely to be due to metadata (MFTs, security descriptors, etc) not being updated properly. NTFS has supported journaling from the very start (where changes are written to a special area first, before they are made, allowing for Windows to pick up where things are left off after any unexpected interruptions) making it more robust to power outages than *shudder* FAT32, but this isn't a 100% guarantee.

Getting back to the original question, data on non-cached drives linked to cached data (e.g. an NTFS junction to a PrimoCached volume) could be at risk also in the event of a crash/power outage, but that is more of an exception than the rule. I've yet to encounter data loss myself that I could tie to PrimoCache, though I have found it prudent after a crash to boot off a secondary partition and run a chkdsk on the primary one.

There are however plenty of other ways data could be lost or corrupted, so regular backups are very much your friend here - and consider automated file versioning of critical/frequently updated files to supplement this using software like Aphar Backup (Dutch website, but program runs in English) - for alternatives, see the end of this previous post.
Nick7
Level 5
Level 5
Posts: 46
Joined: Sun Jun 25, 2017 7:50 am

Re: About Defer-Write and the risk of data loss on power outage

Post by Nick7 »

InquiringMind, NTFS does not support out of order writes. What is does support is journaling, which is completely different thing. Windows also supports delayed writes, but they are journaled and in order.

What is out of order write?
Example:
* You do some I/O, and what needs to be written down is in block, block 1, block 2, block 1, block 3, block 1, block 4 so it's 1,2,1,3,1,4
What each OS is going to do, is write in that exact order: 1,2,1,3,1,4

What PrimoCache is going to do is change order before writing to disk to : 2,3,1,4
This is since 'block 1' would be changed multiple times, and due it not being yet written, it will not write in that order to reduce I/O.

However, let's consider situation where crash occurs, and it happens exactly when 'block 3' is actually written do disk.

On 'regular' system, blocks written would be: 1,2,1, and after that '3,1,4' would be lost due to crash.
With PrimoCache what would be lost is 'block 2', and 'block 1', which should be written already twice would not be changed on disk itself, but rather lost.

What this mean in practice, or real world scenario?

This means that some block might not be written to disk in order that filesystem would expect. This is due to PrimoCache being layer in between filesystem and disk itself.
Next thing this means: in case of deferred writes in PrimoCache, and if delayed cache was not empty, in case of crash there is data loss. Data loss is expected, and that is fine. However, data was NOT written in order to disk, so some data that would be expected to be on disk is missing.
This means that in some cases 'chkdsk' might say filesystem is fine, or repaired either by 'full scan' or just by replaying logs, but in reality you might have missing changes on disk, which means data corruption.

Again, what I did test is do changes on some files, or copy files, or delete, and copy files, and if using deferred writes, I could get errors afterwards, and this is in a way of:
* chkdsk said FS is fine
* I actually missed some files, or some files that were existing were corrupt. Wort of all, it's silent corruption - as chkdsk does not know there is corruption.
Filesystems such as ZFS would report errors due to checksum errors, but NTFS does not support that.

Bottom line, deferred writes can speed things up on slow HDD's, but it has quite dangerous risk. Worst of all is silent corruptions you can have.

This all could be mitigated if deferred writes used only L2 as an option, and used journaling/could recover/empty deferred write buffer upon crash. This is option that for example bcache on Linux has.

Another option is possibility in PrimoCache to disable out of order writes (aka data consolidation, as called in PrimoCache), so in previous example 'block 1' would be written multiple times, and in cace of crash data it written in order and filesystem checks can repair to proper state filesystem.
Post Reply