Event 129 secnvme

FAQ, getting help, user experience about PrimoCache
User avatar
Support
Support Team
Support Team
Posts: 2759
Joined: Sun Dec 21, 2008 2:42 am

Re: Event 129 secnvme

Post by Support »

@neatchee, thank you for the bug report!
Could you tell us your computer hardware configuration (motherboard/cpu/ram/storage disks), and a screenshot of PrimoCache main dialog which shows cache configuration and statistics? And the game name?
Thanks a lot!
Primo Ramdisk | PrimoCache
Romex Software Support

neatchee
Level 5
Level 5
Posts: 49
Joined: Tue Feb 12, 2019 8:38 pm

Re: Event 129 secnvme

Post by neatchee »

Motherboard: Asus Prime Z270-A
BIOS: Rev 1302 (latest)
CPU: Intel Core i7-7700K
RAM: 16GB Kingston HyperX Black DDR4-3000 (2x8GB)
Storage 1: (C:\, Operating System) - KINGSTON SSDNow V300 - 120GB (SV300S37A120G)
Storage 2: (D:\, Bulk Storage, Games, etc)- HGST Deskstar 4TB 7200RPM (HDN726040ALE614)
Storage 3: (E:\, Scratch, Documents, etc) - OCZ-VERTEX2 - 120GB (OCZSSD2-2VTXE120G)
Storage 4: (L2CACHE for D:\) - Samsung 970 EVO - 500GB (MZ-V7E500BW)

Game(s) causing issue: Anthem, Destiny 2

And here is the screenshot you requested...
PLEASE NOTE: Whenever this crash occurs, the cache is cleared (good!). This screenshot was taken after re-populating the cache with only the Anthem data (by triggering a full read on all files in the game directory, using Python) so the cache hit rate is obviously low.
Under normal usage I get a 99+% cache hit rate


Image
Last edited by neatchee on Wed Mar 06, 2019 8:44 am, edited 1 time in total.

User avatar
Support
Support Team
Support Team
Posts: 2759
Joined: Sun Dec 21, 2008 2:42 am

Re: Event 129 secnvme

Post by Support »

Thank you very much for your detailed information! We'll try to set up a similar computer and do the testing.
Primo Ramdisk | PrimoCache
Romex Software Support

neatchee
Level 5
Level 5
Posts: 49
Joined: Tue Feb 12, 2019 8:38 pm

Re: Event 129 secnvme

Post by neatchee »

Some more information from testing:
  • I removed some applications that were regularly querying the device (e.g. hardware monitor software) to make sure that wasn't interfering
  • I tried setting the M.2 PCIE lanes to X2 (instead of X4)
Neither of these helped :(
However it is important to note that when I switched to X2 PCIE lanes, the driver reporting the error was different: storahci
(This is expected, but noteworthy because it means that this problem is not specific to the NVME specification; it will happen with SATA/AHCI too)

Next steps:
  • I have requested a warranty replacement from Samsung, just to make sure the device isn't defective
  • I am getting another M.2 SSD - Crucial MX500 M.2 SATA (not NVME) - and will see if I can reproduce the issue
I would be happy to help collect additional test data! I work as a software tester at a big video game studio (you've definitely heard of our games heh) so I am happy to do some advanced debugging if you aren't able to get the issue to happen for you!

User avatar
Support
Support Team
Support Team
Posts: 2759
Joined: Sun Dec 21, 2008 2:42 am

Re: Event 129 secnvme

Post by Support »

We do appreciate your testing! We're looking forward to the results.
Primo Ramdisk | PrimoCache
Romex Software Support

neatchee
Level 5
Level 5
Posts: 49
Joined: Tue Feb 12, 2019 8:38 pm

Re: Event 129 secnvme

Post by neatchee »

My current theory is that this is a device defect, either a flaw in the design of the 970 EVO, poor motherboard design, or a common manufacturing defect of the 970 EVO.
  • I was able to reproduce this issue (only once) using the Samsung Magician benchmarking utility.
  • If Link Power Management is enabled, I see dramatic instability for the device even without PrimoCache.
  • This suggests a device-level failure, likely related to voltage, during very high throughput or when transitioning out of low power states quickly (when waking up for high throughput requests).
  • The M.2 slots on my motherboard are run through the PCH (not the dedicated PCIE controller in the CPU), so I began experimenting with small voltage adjustments.
  • After increasing the PCH voltage by approx. 0.02v I believe I am seeing increased stability
These results are preliminary, so not confirmed yet. Needs more testing time. :)

EDIT: No dice. Took a little longer than previous cases, but same behavior. I'm not inclined to continue tuning voltage for this issue until I've tested a replacement drive.

User avatar
Support
Support Team
Support Team
Posts: 2759
Joined: Sun Dec 21, 2008 2:42 am

Re: Event 129 secnvme

Post by Support »

Interesting finding! :)
Primo Ramdisk | PrimoCache
Romex Software Support

User avatar
Jaga
Level SS
Level SS
Posts: 525
Joined: Sat Jan 25, 2014 1:11 am

Re: Event 129 secnvme

Post by Jaga »

Just to add to the information on this topic:

I have a new 1TB Samsung 970 EVO and a 32GB L1 read/write (w/deferred writes) Cache Task in Primocache (that is caching the EVO), and don't have any problems whatsoever. Primocache never has problems, and I never see any errors thrown.

It sounds like the problems you're seeing may be related to the special drive access that Primocache uses on the L2STORAGE volume, which in your case is the NVMe. It's definitely not a device defect, since Primocache can talk just fine with the 970 EVO when it's a cached volume.

Hopefully Samsung Magician isn't running all the time either, since it's a drive management utility that -can- mess with caching on a drive if it's enabled at Windows startup.

neatchee
Level 5
Level 5
Posts: 49
Joined: Tue Feb 12, 2019 8:38 pm

Re: Event 129 secnvme

Post by neatchee »

It sounds like the problems you're seeing may be related to the special drive access that Primocache uses on the L2STORAGE volume
This is my best guess as well, considering I don't seem to have problems just using the drive on its own as storage.
It's definitely not a device defect, since Primocache can talk just fine with the 970 EVO when it's a cached volume.
I don't necessarily agree with this assessment. In fact, quite the opposite: using the 970 EVO as a cache volume, instead of the volume being cached, dramatically increases workload and thus any device defect would have more opportunity to manifest (especially if it's related to voltage instability or something similar). Besides, the cache operates normally for some time before the failure occurs, so it's not as simple as "primocache = issue"
Hopefully Samsung Magician isn't running all the time either, since it's a drive management utility that -can- mess with caching on a drive if it's enabled at Windows startup.
Not sure how I feel about this...I'm currently running Magician at startup, but it's also almost completely passive. The only active operations it does on the drive, as far as I'm aware, is when a benchmark test is run. Otherwise it's just there to read S.M.A.R.T. values and check firmware.
I haven't tried running without Magician since I initially upgraded the drive's firmware, so maybe I'll give that a shot if the RMA replacement doesn't help.

User avatar
Jaga
Level SS
Level SS
Posts: 525
Joined: Sat Jan 25, 2014 1:11 am

Re: Event 129 secnvme

Post by Jaga »

Will be interesting to hear what the replacement drive does for you.

Post Reply