L2 Cache content corruption without offline modification(possibly without restart)

Found a bug? Report here
SnowReborn
Level 4
Level 4
Posts: 34
Joined: Sun Dec 02, 2012 3:13 am

L2 Cache content corruption without offline modification(possibly without restart)

Post by SnowReborn »

Windows OS: Windows 10 64bit 2004 19041.450
Hardware Information
    CPU: 8700k
    Main Board: msi z390 gaming plus
    Memory: 16GB x 4 G skill royale 3600 c19
    Hard Drives: 1TB SN550, Samsung 860 Evo 512gb, 8TB seagate barracuda, 16tb seagate exos
PrimoCache Version: 3.0.9
Screenshot(s) of your PrimoCache's main dialog showing cache configuration and statistics:
config 1.png
config 1.png (106.63 KiB) Viewed 4664 times
Problem Description:
First, I really loved this software. However, I ran into issue with L2 cache twice in a span of 3 days, with cache content either out of sync or possibly corrupted. This issue doesn't happen all the time, and I am unable to systematically reproduce the problem. This issue had never happened when I used L1 cache ONLY setup. I have spent a lot of time searching through out the forums and google and was unable to find a similar issue.

I recently discovered that I have a extra 512gb Samsung 860 Evo SSD, it is in relatively good condition with only 200 days power on time and 7.5TB total writes. I decided to use it as L2 cache in addition to my L1 cache for my 8TB hard drive. However, after setting up L2 cache, I experienced errors when opening Noxplayer( android simulator). The error message both times are different. The first time error message was
error 1.png
error 1.png (12.49 KiB) Viewed 4664 times
. And the Second time the error message was
error 2.png
error 2.png (194.81 KiB) Viewed 4664 times
. These two issue was immediately resolved once I pause the Cache, and the issue returns once the cache was resumed.

I still unsure the root cause of this error, and unable to reproduce it with success. Resetting the L2 cache, pausing cache will solve the issue. I have default for L1, read only for L2, defer write option is disabled. Never had any offline modification, No dual booting system. I suspect the cache of L2 is out of sync. Even though I had not modified the source disk in offline setting. For the first time this error happened, I can't really recall if I restarted my PC or not, but I believe I did not restart my pc after setting up the L2 cache since the error happened. The second time the error happened I did restart my PC once, and was greeted by the error after tried to launch noxplayer. Because on FAQ stated that Primocache checks if L2 cache is in sync with source disk on boot, and resets cache content if it detects out sync issue, I wanted to test if primocache is able to see my L2 is out of sync, therefore I proceeded to restart my computer without resetting my L2 cache even knowing my L2 cache may be corrupted. Then on the next boot, the cache was not resetted by primocache, and Noxplayer error still presist.

I would like to add that I also use a program called ISLC together with primocache. ISLC is to clear windows memory cache. Because windows cache is file level, I want to have accurate primocache stats reading, therefore I setted up ISLC to clear windows file level cache periodically, I do not think this can in any way contribute to the problem I mentioned above, since the problem was only discovered on L2 cache. Since then I disabled L1 cache to double check if it is my L2 cache that is corrupted, and indeed it was.

This raises some concern for me. As I previously thought, that without enabling defer write, Primocache should not increase any additional corruption risk for the target drive, even at the event of powerloss or BSOD crashes. However, after encountering this problem, I realized that If the read cache content is out of sync and primocache was not able to self correct / notify the user in time, and this issue can accumulate and become a very large scale corruption. One good scenario is that In the case of total file corruption as my Noxplayer was not starting up at all,this gives the user the knowledge that something is wrong with the cache content, and resetting cache content solving the issue isn't a big problem because the file is completely inaccessible in the first place and will not be written to the source disk. However, what I am afraid of is a situation that, let's say I have a very important document "1.txt", and inside "1.txt"'s content is just a string "0". Some how after caching my 1.txt to L2 cache, The L2 cache is out of sync and corrupted in a way that only flipping 1 bit, making the string "0" becoming a "1" or some other arbitrary number. Since the file itself is not corrupted, it can still be accessed and opened normally by notepad.exe and loading into actual memory, but mutated in a way, So next time when I open 1.txt, it will show the content 1 instead of original 0 because it is loaded from the cache into the memory instead of target disk, and by saving this content, I basically overwrites the original content on my cached source, leading to a very different files. This could be very dangerous to file integrity. Therefore I'd like to know if primocache has any way preventing from above scenario happening, or if not, if we the user can do anything to minimize the possibility of file corruption without constantly checksum our file and imaging them(not practical and realistic). I am very much looking forwards to version 4.0 and keep up the good work during this difficult time!
User avatar
Jaga
Contributor
Contributor
Posts: 692
Joined: Sat Jan 25, 2014 1:11 am

Re: L2 Cache content corruption without offline modification(possibly without restart)

Post by Jaga »

You don't have Samsung RAPID enabled do you? It will conflict with Primocache since the two attempt to do the same thing (though Primocache is superior at it in my opinion).

I wouldn't use ISLC to clear the Windows cache - Primocache is meant to work alongside the native OS cache. Every time in the past when I've attempted to fiddle with WIndows' cache manually it didn't give advantageous results. I'd set Windows caching back to defaults and stop using ISLC. Chances are that it doesn't have an impact, but when you manually change too many parameters, you never know where problems are coming from.

Best practices: disable any other caching mechanisms/software that might be running on the machine. Disable Intel RST if it's on the computer in any form. Disable fast boot, or any other motherboard feature that may impact drive reading/writing. Delete your cache tasks after this is all done and re-create them fresh for a new and clean test.

The other possibility is that the software just doesn't play nice with standard OS reads/writes. I don't specifically recall any titles, though we've seen some games that just didn't play well and wanted to do "their own thing" when it came to files on the drive. Support may have info or ask additional questions on your software if that is the case.
SnowReborn
Level 4
Level 4
Posts: 34
Joined: Sun Dec 02, 2012 3:13 am

Re: L2 Cache content corruption without offline modification(possibly without restart)

Post by SnowReborn »

Jaga wrote: Wed Nov 11, 2020 5:09 am You don't have Samsung RAPID enabled do you? It will conflict with Primocache since the two attempt to do the same thing (though Primocache is superior at it in my opinion).

I wouldn't use ISLC to clear the Windows cache - Primocache is meant to work alongside the native OS cache. Every time in the past when I've attempted to fiddle with Windows' cache manually it didn't give advantageous results. I'd set Windows caching back to defaults and stop using ISLC. Chances are that it doesn't have an impact, but when you manually change too many parameters, you never know where problems are coming from.

Best practices: disable any other caching mechanisms/software that might be running on the machine. Disable Intel RST if it's on the computer in any form. Disable fast boot, or any other motherboard feature that may impact drive reading/writing. Delete your cache tasks after this is all done and re-create them fresh for a new and clean test.

The other possibility is that the software just doesn't play nice with standard OS reads/writes. I don't specifically recall any titles, though we've seen some games that just didn't play well and wanted to do "their own thing" when it came to files on the drive. Support may have info or ask additional questions on your software if that is the case.
Hi, I don't have any other caching software installed. I do not have intel RST and Samsung Magician installed at all. All my partitions are on basic disks and no RAID setup either. I understand fiddling windows cache wouldn't make primocache run any better, the only reason I clear it is to have better accurate reading off the primocache stats so I know how is any given drive is being read / cache hit rate etc, with windows cache on, it will impact the stats because it also reads from windows cache, and windows cache will not be reported in primocache stats as cache hit. For this particular instance, I will disable ISLC and stop clearing windows cache just to see if it potentially helps with the issue, even it's not likely. Thanks for the advise.
User avatar
Support
Support Team
Support Team
Posts: 3623
Joined: Sun Dec 21, 2008 2:42 am

Re: L2 Cache content corruption without offline modification(possibly without restart)

Post by Support »

As per our experience, this issue is usually caused by the conflicts between PrimoCache and other software. Is there any third-party software which may has the caching or speed function? Could you check the issue #014, #016, #017 in the known issue list viewtopic.php?f=34&t=2174 ?
Besides, please disable or uninstall ISLC to confirm that this is not the cause. Additionally, as you have seen errors, so you may need to pause or remove the cache, and then do "chkdsk" on error volumes first to ensure the file system on these volumes are good.
Thanks.
SnowReborn
Level 4
Level 4
Posts: 34
Joined: Sun Dec 02, 2012 3:13 am

Re: L2 Cache content corruption without offline modification(possibly without restart)

Post by SnowReborn »

Support wrote: Thu Nov 12, 2020 4:43 am As per our experience, this issue is usually caused by the conflicts between PrimoCache and other software. Is there any third-party software which may has the caching or speed function? Could you check the issue #014, #016, #017 in the known issue list viewtopic.php?f=34&t=2174 ?
Besides, please disable or uninstall ISLC to confirm that this is not the cause. Additionally, as you have seen errors, so you may need to pause or remove the cache, and then do "chkdsk" on error volumes first to ensure the file system on these volumes are good.
Thanks.
Thanks for the reply. I checked #014 , #016, and #017. I believe non of those apply in my case as I don't have any of mentioned software installed. I only have avira as my anti-virus on my computer. I don't have any other caching software installed on the computer as I mentioned above. chkdsk has been performed, and it was a relatively new hard drive. It is a seagate 8TB. I also have a seagate x16 Exos(512e / 4kn) but is not cached. I have read the known issue list and checked both drives with "fsutil fsinfo ntfsinfo" command line, both shows byte per sector are 512, so I would assume it's safe to cache those drives since they are working under 512 mode instead of 4k mode(as for my knowledge majority of hard drives comes default with 512e, only with special software formatting the drive will result in 4k mode), physical and clusters are 4096. Since I don't know the root cause and was unable to systematically reproduce the problem. I will disable ISLC and see if the problem appears in the future. Although I doubt it's going to help since windows cache shouldn't cause any problem in L2 caching.

Since you mentioned potential conflict, I have hard disk sentinel installed to monitor my disk health and smart attributes, and I do also have Macrium Reflect installed on my machine with CBT and image Guard enabled. Macrium Reflect is a imaging / backup software. If anything that could potentially cause conflict which crosses my mind is the CBT(Changed block tracker for faster incremental imaging) feature with Macrium Reflect, do you advise to turn these 2 features off as well?

Lastly, I am curious to know about my first post's question and concern regarding potential L2 corruptions worst case situation(valid file with mutated content). Does Primocache has any mechanism to check cache integrity or avoid bit flip / bit rot? or prevent situation mentioned above from happening? This problem didn't come to my mind until my L2 cache's corruption and I believe could lead to serious consequences, and I'd like to hear some advises from you guys. Thanks again, and I will keep you updated.
User avatar
Support
Support Team
Support Team
Posts: 3623
Joined: Sun Dec 21, 2008 2:42 am

Re: L2 Cache content corruption without offline modification(possibly without restart)

Post by Support »

Perhaps you may disable/uninstall Macrium Reflect (also ISLC) and see if the problem will still happen.
PrimoCache will clear the cache (or validate the data since v4.0) if it detects ungraceful shutdowns. If computer is gracefully shutdown, PrimoCache will not check data integrity at startup because it's time-consuming.
SnowReborn
Level 4
Level 4
Posts: 34
Joined: Sun Dec 02, 2012 3:13 am

Re: L2 Cache content corruption without offline modification(possibly without restart)

Post by SnowReborn »

Support wrote: Thu Nov 12, 2020 10:01 am Perhaps you may disable/uninstall Macrium Reflect (also ISLC) and see if the problem will still happen.
PrimoCache will clear the cache (or validate the data since v4.0) if it detects ungraceful shutdowns. If computer is gracefully shutdown, PrimoCache will not check data integrity at startup because it's time-consuming.
Thanks for the reply, I will do as you mentioned uninstall Marcrium Reflect and ISLC and try to reproduce the problem. As for cache corruption, I thought about it more, and please correct me if I am wrong. The corruption would only happen if Primocache incorrectly reads from source disk, and writes a different value to the cache in the first time content being cached due to reasons or conflicts. Meaning that After Primocache caches the content for the first time, the value is already different. There shouldn't be, or at least will be a very rare case that the correct cache content corrupts
or mutates after a period of time(since most modern HDD and SSDs already have self ECC functions to check for integrity in block level with the hardware level controller). So in the case of my L2 Cache corruption, After reading the data from source disk for the first time, It already caches incorrect value of blocks into the L2; but what really weird is this only happened after I tried L2 cache, I have been using L1 cache only for a while and never encountered such issue, if indeed it's Primocache incorrectly reads or writes the value, then this problem should also affect L1 cache, yet it isn't.
User avatar
Jaga
Contributor
Contributor
Posts: 692
Joined: Sat Jan 25, 2014 1:11 am

Re: L2 Cache content corruption without offline modification(possibly without restart)

Post by Jaga »

Reflect isn't the problem, if it was I would have seen it -long- ago. :) But I would disable any features to monitor drive/volume changes that it uses (basically just use scheduling for backups on it, but nothing else as far as live monitoring goes). Reflect is stellar software for standard backup usage, though I could easily see it's 'advanced' features as impeding Primocache's functionality.
SnowReborn
Level 4
Level 4
Posts: 34
Joined: Sun Dec 02, 2012 3:13 am

Re: L2 Cache content corruption without offline modification(possibly without restart)

Post by SnowReborn »

Jaga wrote: Fri Nov 13, 2020 7:24 pm Reflect isn't the problem, if it was I would have seen it -long- ago. :) But I would disable any features to monitor drive/volume changes that it uses (basically just use scheduling for backups on it, but nothing else as far as live monitoring goes). Reflect is stellar software for standard backup usage, though I could easily see it's 'advanced' features as impeding Primocache's functionality.
Thanks for the reply! I typed this on my phone excuse me if there are many grammatic errors and typos. I have been quite busy lately and didn't have time attempt to reproduce the issue in a controlled environment. However, I did run a simple experiment about L2 Cache corruption to confirm my concern in my very first post on top, and I have some very bad news to share :? :cry: . I made an empty partition, then I created a "important.png" file, colored everything in gray, saved it. Made an 128 mb cache for that partition which contains the image file, L2 read only. Then I open the "important.png", which gets stored in L2 cache, and made sure to open few more times to see the hit rates go up. Next, I utilize Diskgenius(disk utility) to open the raw hex value for my L2 Caching Partition, searched for hex value "7F"(which corresponding to color gray), over writes half of them into "00" (corresponding color black). Then saves the change. Now when I open my "important.png", it will show half of the image gray, half of the image black(Reflects the changes I made from diskgenius, and simulating an corruption in L2). Then, if I pause the cache, and open the image again, I will get the original image, which is expected behavior(same behavior as my broken Android emulator, except in this case, the corruption happens in a content level instead of file header, so the file is still accessible but with mutated content, and does not throw an error upon opening). Now regardless if I open the original file, Primocache will not update the corrupted L2. Now, the thing that worries me the most, If I resume the cache, which again opens the corrupted gray/ black image, if now I instead of opening the image, I click edit, in the image editor it will also show the corrupted content(half black half gray), and if I proceed to save the file, it will overwrite the original file on the disk with the mutated corrupted content!(which is also expected in my first post's assumption) After that, regardless I if unpause or reset L2 cache, my "important.png" will be the mutated version and original file will be gone. I understand this is an extremely rare case and most likely not going to happen, and only if the corruption happens on the content, not the header, and the file somehow is gets saved, then it will overwrites the original content; however, there are still possibility and many situations like this could happen, for instance, some applications' config file, which gets cached and then saved upon exit, which this issue could be relevant. For example, an SQL database, gets read incorrectly, and being overwritten with bad values because L2 corruptions. Let's say some hardware failure happens on L2 cache, for instance bit rot, flipping one of the values of cached content, which can also results issue above. I understand Primocache has taken some measure to combat potential issues like this. First the L2 caching partition has no letter, so most other application shouldn't be able to access it unless it has very high privilege like disk utility software or kernel level drivers and services.

I only know how caching works in a very surface level, so correct me if I'm wrong, and I'd be loved to be educated more on this particular topic. Since I don't really know a lot about caching, I started to wonder if maybe many other caching solutions shares similar weakness or potential risks. For instance the solutions Jaga and support mentioned above, Samsung Magician, Intel RST, AMD StoreMi, etc.... I also wonder if windows file level cache also have problem like these, or does it only happen on block level caching? Lastly, I want to highly stress that my findings DOES NOT conclude Primocache currently poses risk to users' data, because I am using disk utility to directly change the raw data on the Caching partition and it is a very unrealistic situation that probably will never happen in real life usage(except my android emulator), and this experiment is only done to confirm my suspicion. I suspect my previous issue with Android emulator could be what support / Jaga mentions, some third party software which has high level privileges that can modify the disk content bypassing Primocache, or somehow was able to modified L2 Cache partition, but I also suspect there might be slight possibility that Primocache failed to read the original source first time it's being cached, and saves incorrect value in the first time. Any thoughts on this issue would be greatly appreciated! Thanks.
User avatar
Support
Support Team
Support Team
Posts: 3623
Joined: Sun Dec 21, 2008 2:42 am

Re: L2 Cache content corruption without offline modification(possibly without restart)

Post by Support »

Actually this is one kind of "Offline Modification" which should be avoided by users. PrimoCache has no ways to prevent such modifications. Please see the "Potential Problem and Notice" section in the page https://www.romexsoftware.com/en-us/pri ... cache.html.

And regarding the previous corruption issue, could you open the registry editor and then locate to the branch "HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Class\{71a27cdd-812a-11d0-bec7-08002be2092f}, then send us a screenshot of its values? We'd like to check the values in "LowerFilters" and "UpperFilters".
And same operations to the branch "HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Class\{4d36e967-e325-11ce-bfc1-08002be10318}".
Thank you.
Post Reply