Deferred Writes - Power Loss + this is awesome.

FAQ, getting help, user experience about PrimoCache
BonzaiDuck
Level 7
Level 7
Posts: 88
Joined: Wed Jan 11, 2017 12:57 am

Re: Deferred Writes - Power Loss + this is awesome.

Post by BonzaiDuck »

Just about every caching solution I know of has a warning about possible data loss with "Deferred Writes" or as it is called in Intel ISRT-caching -- "Maximum" performance level. All of my systems have good UPS power backup, and all of the battery systems are APC with maximum wattage ratings in excess of what is connected to them.

Lately, after another user and forum said they felt "comfortable" with deferred writes because of their UPS, I decided to configure it knowing that I could pause it during a bootup session. I've had no problems with it.

The other thing one needs to consider is the possibility that there is anything to flush and write to disk at the time of a power outage. The outage could occur during some very intensive computer activity, or at a time when the system is idle. So the risk may vary, but I think it is lower just for the reason I cited.
SolarDaveGreen
Level 1
Level 1
Posts: 4
Joined: Fri May 18, 2018 6:40 am

Re: Deferred Writes - Power Loss + this is awesome.

Post by SolarDaveGreen »

I have noticed on my laptop with a 2TB spinning hard drive that if I set the Primo Cache delayed write to even ONE or TWO seconds, the amount of disk activity becomes much much less.

I don't want to start a huge discussion, but my only worry is data loss on Blue Screen of Death BSOD.

However, when I have BSOD, a file (memory dump) is saved to the hard drive.

Doesn't that mean the disk buffer is flushed? Doesn't that means the small delayed write data will also be flushed?

Is the codepath to write the Primo Cache data to disk separate from the codepath that writes the BDOS data, or is it the same?

Thanks - I would really like to turn on 1-2 seconds of delayed write.
User avatar
Support
Support Team
Support Team
Posts: 3628
Joined: Sun Dec 21, 2008 2:42 am

Re: Deferred Writes - Power Loss + this is awesome.

Post by Support »

The code path to write cache data to disk is different from that to write BSOD dump data.
Axel Mertes
Level 9
Level 9
Posts: 180
Joined: Thu Feb 03, 2011 3:22 pm

Re: Deferred Writes - Power Loss + this is awesome.

Post by Axel Mertes »

I would use only SSD READ caching for you hardware main OS. Then use another cache task for all the VM's to enable R/W for them and you can rebuild these VM's easily if one is broken by a BSOD crash and lost write cache data.

This gives you maxium performance.

You may consider using an Intel P900 for caching rather than standart SSDs (Intel 3D-Xpoint memory).


@support:
That said, I can only underlined how cool it would be to give some kind of journaling option to the (deferred) write cache in terms of picking up not yet performed deferred writes on a reboot. All this requires - after all - is to have a single bit set for every deferred write cache block that has not been written to the cached disk itself yet. After each deferred write action you need to mark these write cache blocks as "safe". You have the logic basically there, because you already track the deferred write cache blocks. However, you do not keep that info on the SSD, but in RAM. As you only write to SSD and to the HDD every now and then, it would just add a little more overhead.

Of course you can only have this kind of journaling for blocks that exist in the SSD cache. So for me the "perfect" write cache works as follows:

1. RAM cache to combine several writes of small size into a bigger write to SSD (thats what e.g. Condusiv Diskeeper IntelliWrite does).
2. SSD cache to combine several writes of small size into a bigger flush of writes to HDD/RAID.

I am not even sure if 1.) exists in PrimoCache as of now.
I would also be totally fine with having ONLY an SSD cache without any RAM cache at all for write caching. Why? Simple: If you have only SSD caching, you have to write the data always to a medium that survives BSODs etc. If the knowledge of which cache block needs to be written to which HDD block is also kept on the SSD you can at least start implementing a recovery on bootup.

However, all the above will only make sure that blocks that have been successfully written to SSD plus the info if or if not they still need to be written to the HDD/RAID (deferred write) can be recovered and missing writes can be finished. It can save your life (well, your OS). It must not. With the advent of cards like the Intel P900, the I/O is so much higher for SSDs now, that this becomes even more interesting to implement.

IMPORTANT though - in fact I would believe in following:
If I encounter a BSOD with a HDD based system it'll be more likely to loose data than with a HDD system that includes PrimoCache SSD 2nd level deferred write cache. Why? Because the SSD would be faster in writes and its more likely that important blocks got really saved to the SSD than it would be directly to the HDD. However, it helps nothing if the data was successfully written to SSD and can't be recovered.

Bottom line:
Make sure that you mark every deferred write block you cache on SSD as "not yet written to final HDD" per default. Then, when you perform the deferred write from SSD to HDD, you add a write to SSD to store "block has been written to final HDD".

This way the driver could look up the cache drive for that info on bootup and at least ask the user if outstanding cache writes should be performed for the sake of data consistency or not. This should be fairly simple, except that you need a mechanism to keep the data on SSD somewhat "current".

Just some thoughts...

Cheers
Axel
User avatar
Jaga
Contributor
Contributor
Posts: 692
Joined: Sat Jan 25, 2014 1:11 am

Re: Deferred Writes - Power Loss + this is awesome.

Post by Jaga »

I am not certain if "write coalescing" would make a drive cache faster, though it would make it more CPU/resource efficient.

Personally, my ideal cache is one that operates at the highest possible speed, resources non-withstanding. It's a little like gaming on the internet, where you can choose to wait until a TCP buffer fills up, and THEN send it (more efficient but much slower), or you can send as little as a few bytes when it is ready (the least efficient but the absolute fastest).

It's a question of priorities, and what the underlying hardware can support. If it can process tiny chunks as fast as you can send them (i.e. a SSD layer), then write coalescing isn't necessarily advantageous. In fact if it holds data too long it won't make chunks marked as flushed in the cache available to re-use, which is ultimately inefficient even from just a speed perspective.

So while I agree with you in part, I also disagree. There has to be some kind of tuning mechanism for the aggressiveness of writes (much like our current delay write strategy) and coalescing them, or not.
Axel Mertes
Level 9
Level 9
Posts: 180
Joined: Thu Feb 03, 2011 3:22 pm

Re: Deferred Writes - Power Loss + this is awesome.

Post by Axel Mertes »

My ideas where not about anyway altering the caching algorithm, but about adding functionality to survive a BSOD and recover with as much as possible data, ie. all data, that has at least been written to an SSD write cache, but not yet to the target disk.

Whatever makes a cache strategy efficient or not is on a completely different level and somewhat independent discussion.

I know that the above idea add a bit of overhead, but one that may be well worth for some pro users, who want to have the best compromise between using a write cache AND safety. As of now, using write cache is more or less unsafe. That would change a lot with having a SSD backlog of the write cache index.
Axel Mertes
Level 9
Level 9
Posts: 180
Joined: Thu Feb 03, 2011 3:22 pm

Re: Deferred Writes - Power Loss + this is awesome.

Post by Axel Mertes »

Just to reiterate:

1. It's more likely that more data gets safely stored on a SDD write cache than on the target HDD in case of a system crash, due to the fact that the SSD write cache is way faster.

2. Data safely written to the SSD write cache in case of a system crash is completely useless if you can not perform the unfinish flush to HDD on reboot.

3. In case you can pick up the SSD write cache state during reboot and finish the flush to HDD this would be a major improvement, even safety wise, over the HDD only system without any write caching enabled.
User avatar
Support
Support Team
Support Team
Posts: 3628
Joined: Sun Dec 21, 2008 2:42 am

Re: Deferred Writes - Power Loss + this is awesome.

Post by Support »

Axel Mertes wrote:1. It's more likely that more data gets safely stored on a SDD write cache than on the target HDD in case of a system crash, due to the fact that the SSD write cache is way faster.
That's true.

We still have some problems to make 100% (or close to 100%) safety, but we will work on it.
Axel Mertes
Level 9
Level 9
Posts: 180
Joined: Thu Feb 03, 2011 3:22 pm

Re: Deferred Writes - Power Loss + this is awesome.

Post by Axel Mertes »

support wrote:
Axel Mertes wrote:1. It's more likely that more data gets safely stored on a SDD write cache than on the target HDD in case of a system crash, due to the fact that the SSD write cache is way faster.
That's true.

We still have some problems to make 100% (or close to 100%) safety, but we will work on it.
In that context I can only point to Condusiv Diskeeper/IntelliWrite. They use RAM caching only at this point (SSD is planned from what I have heard). They combine writes in RAM to reduce overhead and they mark small block segments as unavailable to refuse fragmentation a bit and make use of larger block segments first.

Anyhow, they explain their customers that their approach is safe. But it is a caching.

I can only think of it being faster on average in combined writes than in individual writes and that its safer due to not loosing as much data as with the longer latency of individual writes. That would make sense.

Safety is only as good as the transaction model. If we can prove that in most cases its safer to use an SSD cache over a direct HDD access (given a working pick-up of remaining disk flush operations after BSOD or similar reboot), its IMHO totally fine to use it. One can increase safety on the hardware side, such as using an SSD with battery or big capcitor, using a UPS, multiple PSU etc. but that has nothing to do with software.

After all, I think you (Romex Software) are more anxious that you need to be. If you can implement that your cache index (write only, or both read and write) is stored on SSD as well (whenever you flush data from RAM to SSD...), then you are almost done IMHO. You just need to pick up that info during boot and perform the necessary steps.

You may ask the user in case, for the rare case that a system crashed and the drives have been modified in a 3rd party system meanwhile before the next reboot. You may even find a way to find out about it, such as having a hash of the MFT or something like that, I don't know. But that would be extremely advanced and IMHO not necessary. For me it would be surely enough if such a system finds undone flush to disk write cache blocks on the SSD during boot and it tells me something like this during boot:

"123 write cache blocks found that have not yet been flushed to disk. Perform flush to disk now? (Y/N)

Thats all I need.
User avatar
Support
Support Team
Support Team
Posts: 3628
Joined: Sun Dec 21, 2008 2:42 am

Re: Deferred Writes - Power Loss + this is awesome.

Post by Support »

Well, since you mentioned the cache index, I have to say that one of key problems is about the cache index. it is stored on SSD, however, the problem is to ensure the "sync" between index and cache data. Index might be kept changing because of new write-data coming or cache replacement algorithm. So for each time of index change, you have to immediately update to SSD. Huge amount of writing on SSD might occur. Besides it is possible that during the sudden power lost, the latest index didn't be updated to SSD. Then during the recover stage, old index on SSD might cause disaster.
Post Reply