Power lost, L2 contents lost

FAQ, getting help, user experience about PrimoCache
ml70
Level 2
Level 2
Posts: 9
Joined: Sat Dec 07, 2019 6:31 am

Power lost, L2 contents lost

Post by ml70 »

Is this how it's supposed to be, or is there something wrong with my setup. Trialing PrimoCache 3.0.9.

RAM Read cache on, Write cache off. L2 Read and Write cache on SSD drive, Deferred write on (Intelligent), period Infinite.
I mucked up something (unrelated to PrimoCache) and had to force power off from switch. Waited a long time that there wasn't any disk activity left before the eventual poweroff. Rebooted.

500+ GB of data on L2 cache is completely gone. Why is it gone? It was stored on L2 SSD just fine. It shouldn't disappear anywhere due to unscheduled power cycling, the SSD is not a volatile medium. Why did it? It wasn't even very fresh (90% over 24 hrs old), so a caching program should've had more than enough time to save a data structure to the disk as to what is what (sector mapping). PrimoCache doesn't do this?

In other news PC is a great program but the lack of genuine L1->L2->HDD caching is a funny thing. Now L1 is flushed to disk when there is cache pressure. So when is there cache pressure? When the disk is busy...so PC logic goes that at a time like that, it's perfectly desirable to add to the disk load by making an Urgent flush to the disk. Yes, to the L2 cache disk! No, not to the real slow overtaxed mechanical disk :( Someone really had a leap of logic about this.

Well, and this above issue of L2 data disappearing despite having been written on the L2 disk.
I am very thankful of giving this a thorough testing in advance. With real data this could've been a serious disaster. Because one wouldn't expect that already written data suddenly evaporates.
User avatar
Jaga
Contributor
Contributor
Posts: 692
Joined: Sat Jan 25, 2014 1:11 am

Re: Power lost, L2 contents lost

Post by Jaga »

That's a normal side-effect of Primocache being unable to tell if the cache is clean or dirty after a power loss. On ungraceful shutdowns, changes to the drive when Primocache is offline, or other unknown changes to the drive contents, Primocache usually assumes it is dirty (i.e. doesn't match what Primocache remembers), and so it has to toss the contents of the cache and start over fresh.

This is particularly common when deferred writes are turned on, and they aren't completely flushed before the ungraceful event occurs.

I think Support is working on long-term fixes/adjustments to how the system works to try and avoid cache loss, but they'd have to chime in with the specifics to know for sure.
User avatar
Support
Support Team
Support Team
Posts: 3623
Joined: Sun Dec 21, 2008 2:42 am

Re: Power lost, L2 contents lost

Post by Support »

As Jaga said, Primocache is unable to tell if the cache is clean or dirty after a power loss, so it resets the cache to avoid the possible problems. PrimoCache has an option to turn off this reset and preserve the cache even after a crash. However, this option is strongly not recommended to be enabled. We are working to validate cache data after a crash. This is an upcoming feature and may bring better user experience.
ml70 wrote: Mon Dec 16, 2019 4:08 pm In other news PC is a great program but the lack of genuine L1->L2->HDD caching is a funny thing.
This is because PrimoCache is initially designed to reach best write performance and able to handle more IOs. We are also working on a new option to allow writing from L1 to L2.
ml70
Level 2
Level 2
Posts: 9
Joined: Sat Dec 07, 2019 6:31 am

Huge read/write spikes sometimes

Post by ml70 »

What could be the cause of this, huge read/write spikes appearing rarely, less than once a day, during which everything else halts down. Process doing this is System(4). Screenshot from Process Hacker 2. This lasts for 5-6 minutes during which the system is completely unusable. No data goes in/out of application during this. This does not happen without PrimoCache.

RAM Read cache on, Write cache off. L2 Read and Write cache on SSD drive, Deferred write on (Average, the small spikes on Disk chart), period 3600.

Image
User avatar
Support
Support Team
Support Team
Posts: 3623
Joined: Sun Dec 21, 2008 2:42 am

Re: Power lost, L2 contents lost

Post by Support »

Please upload a screenshot of the PrimoCache main dialog which shows the cache configuration and statistics.
How about using the Native write mode and 10 second latency?
ml70
Level 2
Level 2
Posts: 9
Joined: Sat Dec 07, 2019 6:31 am

Re: Power lost, L2 contents lost

Post by ml70 »

Here's the pic and the L2 gather interval is 15 seconds.

In general everything is well as long as there is empty space on L2 cache disk. But after it reaches 32 MB funny things start to happen sooner or later. The large empty L2 space right now is because I pre-emptively emptied the cache on reboot, after some lockup type things started happening again upon cache hitting its max with only the default 32 MB left.

Unfortunately my trial has only 3 days left so i'm not sure if i can troubleshoot this to conclusion. I purposefully chose Average write mode because that's what I need due to constant random access read activity on the disk.

Image
User avatar
Support
Support Team
Support Team
Posts: 3623
Joined: Sun Dec 21, 2008 2:42 am

Re: Power lost, L2 contents lost

Post by Support »

ml70 wrote: Fri Jan 03, 2020 2:34 pm Unfortunately my trial has only 3 days left so i'm not sure if i can troubleshoot this to conclusion.
I have sent you an email including a testing license. Please check.
ml70
Level 2
Level 2
Posts: 9
Joined: Sat Dec 07, 2019 6:31 am

Re: Power lost, L2 contents lost

Post by ml70 »

Thank you, it'll take a while to rebuild the cache to the point where L2 is full, will follow up on what happens with v3.2.0 at that point.

What the UI refers to as "blocks" above, is this some fixed size block or the size configured as block size? So 256 kb, in this case. Trying to understand how much for example 1000 deferred blocks is which has been the recent average, as if it is the configured size it is surprisingly little, compared to the size of the L2 cache.

Judging from disk IO the Average mode completely overrides any intended defering of writes, my new Defered Write timeout is 10801 seconds which is carefully calculated to not overfill the cache, yet all writing activity is finished rather quickly by Average mode. I would have expected it to average the amount written over the defer write period or even longer, only starting writing after the defer write period has been met (this would be the best), but at a controlled averaged speed to not disturb other disk IO too much.

All disk read activity is small random reads (think database queries etc) which is the worst type of load for hdd, so i'm trying to cache and spread out the writes as smoothly as possible not to disturb the core load (reads) too much. Native and Intelligent modes seemed to make the writes too heavy. And there is almost never idle, so that limits using the other options somewhat.
ml70
Level 2
Level 2
Posts: 9
Joined: Sat Dec 07, 2019 6:31 am

Re: Power lost, L2 contents lost

Post by ml70 »

Of course I need to verify that i'm talking about the right thing, but when following the Disk tab on Process Hacker 2, these spikes seem to coincide with amount of Defered Blocks getting less, and writes happening to the HDD (following with Task Manager). And the spikes are sometimes huge, closer to 1 GB. This is not quite what i had in mind when thinking of averaging.
User avatar
Support
Support Team
Support Team
Posts: 3623
Joined: Sun Dec 21, 2008 2:42 am

Re: Power lost, L2 contents lost

Post by Support »

ml70 wrote: Wed Jan 08, 2020 2:49 am What the UI refers to as "blocks" above, is this some fixed size block or the size configured as block size? So 256 kb, in this case.
The whole cache space is sequentially divided into many small units with fixed size. Each unit is called a block. The unit size is decided by the cache configuration "Block Size". In this case, 256KB. If a block contains deferred write-data, even only 1 byte, this block is called a deferred block. The amount of data in a deferred block to be written to the target disk varies from 256 bytes to the block size (here 256KB), depending on how many bytes are deferred in the block.
ml70 wrote: Wed Jan 08, 2020 2:49 am Judging from disk IO the Average mode completely overrides any intended defering of writes, my new Defered Write timeout is 10801 seconds which is carefully calculated to not overfill the cache, yet all writing activity is finished rather quickly by Average mode.
As far as I know, in the Average mode, the input latency value is internally limited to maximum 120 seconds, results shall be same when you set latency greater than 120 seconds.
Post Reply