Primocache, for the second time, has corrupted my 40TB ReFS Storage Space to RAW

FAQ, getting help, user experience about PrimoCache
Post Reply
compsmith
Level 1
Level 1
Posts: 1
Joined: Fri Feb 03, 2023 2:43 pm

Primocache, for the second time, has corrupted my 40TB ReFS Storage Space to RAW

Post by compsmith »

Once again I am absolutely sickened that after a normal reboot following a windows server 2019 update my 40TB ReFS PArity Storage Space is now not accessible and reporting as RAW in disk management. This happened in almost the same way a few months ago which I had to copy off 20TB+ off the array with ReclaiMe File Recovery to external drives before reformatting the array and copying the data back (2-3 week process).

After the reboot the Storage space was put into an offline state. Bringing it back online resulted in it displaying the correct driver letter (D:) but in a RAW state.
Event Viewer Log entry about every second

Code: Select all

ReFS failed to mount the volume.
Context: 0xffffc901cc49a180
Error: The volume repair was not successful.
Hardware Specs
Dell Poweredge server
Intel(R) Xeon(R) CPU E3-1225 v5 @ 3.30GHz
32GB ECC RAM
2TB NVME SSD (1/2 OS 1/2 L2 Cache)
8x - 6TB WD RED NAS Hard Drives

Software & Configuration
Windows Server 2019
Storage Space Thin Parity (16KB for interleave, and 5 for columns)
ReFS 64K Formatted integrity streams enabled
50 Minutes of UPS battery backup

PrimoCache Server
(1TB L2 Cache 50%/50% R/W)
64KB Block Size
Deferred-Write 60

I have been using ReFS Storage spaces for years and never had this catastrophic corruption issue until I started using Primocache. Now I have to spend weeks copying data back unless your dev team has any ideas on how to get the ReFS partition back. The data is there I can see it with ReclaiMe File Recovery. Various softwares recognize it as ReFS, tried AOMEI Partition Assistant Partition recovery wizard but no luck. If there is nothing that can be done I will need to ask for a refund as I will not be able to use this software anymore as it is unreliable with my configuration.
User avatar
Support
Support Team
Support Team
Posts: 3623
Joined: Sun Dec 21, 2008 2:42 am

Re: Primocache, for the second time, has corrupted my 40TB ReFS Storage Space to RAW

Post by Support »

We are very sorry for the trouble caused to you! Since you have submitted a ticket, we have responded via email. Please check.
tverweij
Level 6
Level 6
Posts: 74
Joined: Thu May 10, 2018 9:27 am

Re: Primocache, for the second time, has corrupted my 40TB ReFS Storage Space to RAW

Post by tverweij »

As I cache almost only ReFS volumes (Windows Server 2019) without any problem on multiple servers, the problem must be storage spaces, not ReFS.
Nick7
Level 5
Level 5
Posts: 46
Joined: Sun Jun 25, 2017 7:50 am

Re: Primocache, for the second time, has corrupted my 40TB ReFS Storage Space to RAW

Post by Nick7 »

You have dedicated ~250GB for write cache.
If all is not flushed to disks before reboot, this may occur.

Sorry, but you are using defer write with quite large write cache. Defer write can and will cause corruptions if all L1/L2 is not flushed completely before reboot.
On the other hand, you are using Parity with 6 drives, which is not ideal - and write performance of that pool is probably quite poor. This just means it may take quite long time to flush L1/L2 to drives, before reboot actually occurs. Breaking flushing and forcing reboot will yield data loss, and possibly complete corruption of FS.

Seeing you have 'important' data on your SS pool - you should NOT use defer write.
tverweij
Level 6
Level 6
Posts: 74
Joined: Thu May 10, 2018 9:27 am

Re: Primocache, for the second time, has corrupted my 40TB ReFS Storage Space to RAW

Post by tverweij »

@compsmith

I see that you use a deferred write of 60 seconds.
I don't dare that on important data.

I split my data into two parts:
1. Unimportant (logs, swap files, SQL Server temp database, etc). I really don't care if data get lost.
2. Important (everything else)

For 1 (unimportant), I use 60 seconds defer, for 2 (important) I don't go higher than 1 sec.
And I only dare to use that one sec because I run in a data center - battery backup and generator backup on two different power sources that have their own battery and generator backups - making sure that everything has a 100% power guarantee for at least 2 weeks without grid power.

Further, I made sure that Windows won't kill the Primochache service on a reboot or shutdown.
Normally, windows kills services when they do not shutdown within 5 seconds.
With the registry key [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control]"WaitToKillServiceTimeout"="60000" I changed that to a minute, giving primochache the time to flush the deferred data.

But running any volume with important data on it, without backups, with or without primocache, is irresponsible.
rv112
Level 1
Level 1
Posts: 1
Joined: Fri Jul 07, 2023 4:01 pm

Re: Primocache, for the second time, has corrupted my 40TB ReFS Storage Space to RAW

Post by rv112 »

Today I had the same issue. Eventmanager shows "Defer write failed". I saw that the whole storage pool went offline. I was able to change it back to online and had only one file lost I was copying while this happened.

Windows Server 2022, 40TB Storage Pool with parity. PrimoCache Server 4.2.0.
tverweij
Level 6
Level 6
Posts: 74
Joined: Thu May 10, 2018 9:27 am

Re: Primocache, for the second time, has corrupted my 40TB ReFS Storage Space to RAW

Post by tverweij »

Did you check the logs to see what happened first?

Did it first log a Defered Write Failure or did it first log that the storage pool went offline?
S2GUnit
Level 1
Level 1
Posts: 4
Joined: Wed Apr 26, 2023 4:43 pm

Re: Primocache, for the second time, has corrupted my 40TB ReFS Storage Space to RAW

Post by S2GUnit »

I had a similar issue tonight. I'm testing Primo Cache for the first time but had to restore a Macrium backup.

My PC has an Optane 905p split in to 2 partitions.

Once I restored the backup of my C drive. My D was left as RAW and had to be formatted. Luckily, before doing Macrium restore, I backed up my whole D drive.

I tested a Macrium backup again this morning & my D drive was seen as RAW when my PC rebooted into Macrium's restore boot menu.
The D drive which keeps appearing as RAW,


How can I prevent that issue from happening again? I'll need to flush all of the cache before the restore?

Thanks
User avatar
Support
Support Team
Support Team
Posts: 3623
Joined: Sun Dec 21, 2008 2:42 am

Re: Primocache, for the second time, has corrupted my 40TB ReFS Storage Space to RAW

Post by Support »

S2GUnit wrote: Tue Aug 01, 2023 4:12 am I had a similar issue tonight. I'm testing Primo Cache for the first time but had to restore a Macrium backup.

My PC has an Optane 905p split in to 2 partitions.

Once I restored the backup of my C drive. My D was left as RAW and had to be formatted. Luckily, before doing Macrium restore, I backed up my whole D drive.

I tested a Macrium backup again this morning & my D drive was seen as RAW when my PC rebooted into Macrium's restore boot menu.
The D drive which keeps appearing as RAW,


How can I prevent that issue from happening again? I'll need to flush all of the cache before the restore?
I'm sorry for the late reply. Can you tell me which drives were cached by PrimoCache when you restored the backup? And what is the cache configuration. I'd appreciate it if you can upload your cache configuration by following the guidance in the link below?
https://kb.romexsoftware.com/en-us/2-pr ... leshooting

We intuitively feel that this problem might be caused by cache inconsistencies after restoring the backup. That's why we need your cache configuration at that time for analysis. Thank you.

PS. if possible, use NTFS file system instead of ReFS if you use caching.
Post Reply