Samsung 980 Pro IOPS almost halved with Primo L1 cache - what am I doing wrong?
Posted: Mon Apr 04, 2022 3:36 pm
Greetings,
I am interested in using PrimoCache to increase the serial read performance of my NVMe drives (Samsung 980 Pros on PCIe 4). I've tested with 2GB, 4GB, & 8GB L1 RAM caches (4K) and get the same results each time - serial read performance increases dramatically, but IOPS are almost halved. Without Primo I get almost 1 million IOPS, but with Primo it's just over 500K IOPS. I tried increasing the block size all the way up to 64K and saw no increase in IOPS (the 980 is formatted with 4K clusters if I'm not mistaken) but did see a large decrease in serial read performance. My understanding was that an L1 RAM cache should increase IOPS, not decrease them, so the drop in IOPS is definitely not what I expected to see.
The only thing that occured to me, based on what I was seeing, was that the DRAM module on the 980 is almost twice as fast as my system RAM. I looked that up and the DRAM module on the 1TB & 2TB 980 Pro is LPDDR4, rated up to 1 million IOPS; the system RAM is Corsair DDR4 (not low power), rated at 3,600MHz via DOCP with CAS latency of 16 - I didn't find an IOPS rating for the Corsair RAM or a speed or CL rating for the Samsung DRAM module. Considering the 980 is using low power DDR and my system RAM is not low power, I find it difficult to believe that the Samsung DRAM is capable of nearly twice as many IOPS as my system RAM when my system RAM is responsible for so many things beyond storage I/O...many of them 'simultaneously'. Is my intuition regarding this off-base? If not, what am I missing here? Is this due to unrealistic expectations on my part, or lack of knowledge on how to properly configure the L1 cache in Primo?
I've searched the Romex forums for the term IOPS and Googled for various sets of terms centered around the primary terms "primocache", "limited", and "iops" and read pretty much every thread in the first several pages of results, from both sites, that seemed to have even the slightest connection to encountering IOPS limitations with PrimoCache - as well as several others that seemed like they might at least provide some insight. Unfortunately, I was not able to find anything to explain what I'm seeing - lots of stuff about trying to use SATA SSD & M.2 NVMe L2 caching schemes causing IOPS losses, but nothing specifically about encountering this issue using only L1 cache. I even checked the quick start guide...no help there either.
Additional data point: I do not have a UPS and we do occasionally (although rarely) have power outages, so delayed write is not really a good option for me.
Here are the general specs for my box that I *think* apply to this situation:
Ryzen 9 5950x / 64GB (4x16GB) DDR4 @ 3,600MHz using DOCP profile in dual channel mode / X570 based motherboard using AGESA 1.2.0.6b / Samsung 980 Pro 1TB C: & 2TB D: attached via M.2 slots providing 4 lanes of PCIe 4 to each drive.
The CPU is not overclocked - the RAM's DOCP configuration and the video card's factory overclock are the only overclocks on the system.
What am I missing here? I can't imagine there is any kind of hard limit on IOPS coded into PrimoCache (in fact one thread I read even had a statement from support clearly saying that there was no limit), so I'm convinced I'm missing some crucial piece of information that explains the results I'm seeing. Would someone be willing to set me straight here?
:Edited for clarification:
I am interested in using PrimoCache to increase the serial read performance of my NVMe drives (Samsung 980 Pros on PCIe 4). I've tested with 2GB, 4GB, & 8GB L1 RAM caches (4K) and get the same results each time - serial read performance increases dramatically, but IOPS are almost halved. Without Primo I get almost 1 million IOPS, but with Primo it's just over 500K IOPS. I tried increasing the block size all the way up to 64K and saw no increase in IOPS (the 980 is formatted with 4K clusters if I'm not mistaken) but did see a large decrease in serial read performance. My understanding was that an L1 RAM cache should increase IOPS, not decrease them, so the drop in IOPS is definitely not what I expected to see.
The only thing that occured to me, based on what I was seeing, was that the DRAM module on the 980 is almost twice as fast as my system RAM. I looked that up and the DRAM module on the 1TB & 2TB 980 Pro is LPDDR4, rated up to 1 million IOPS; the system RAM is Corsair DDR4 (not low power), rated at 3,600MHz via DOCP with CAS latency of 16 - I didn't find an IOPS rating for the Corsair RAM or a speed or CL rating for the Samsung DRAM module. Considering the 980 is using low power DDR and my system RAM is not low power, I find it difficult to believe that the Samsung DRAM is capable of nearly twice as many IOPS as my system RAM when my system RAM is responsible for so many things beyond storage I/O...many of them 'simultaneously'. Is my intuition regarding this off-base? If not, what am I missing here? Is this due to unrealistic expectations on my part, or lack of knowledge on how to properly configure the L1 cache in Primo?
I've searched the Romex forums for the term IOPS and Googled for various sets of terms centered around the primary terms "primocache", "limited", and "iops" and read pretty much every thread in the first several pages of results, from both sites, that seemed to have even the slightest connection to encountering IOPS limitations with PrimoCache - as well as several others that seemed like they might at least provide some insight. Unfortunately, I was not able to find anything to explain what I'm seeing - lots of stuff about trying to use SATA SSD & M.2 NVMe L2 caching schemes causing IOPS losses, but nothing specifically about encountering this issue using only L1 cache. I even checked the quick start guide...no help there either.
Additional data point: I do not have a UPS and we do occasionally (although rarely) have power outages, so delayed write is not really a good option for me.
Here are the general specs for my box that I *think* apply to this situation:
Ryzen 9 5950x / 64GB (4x16GB) DDR4 @ 3,600MHz using DOCP profile in dual channel mode / X570 based motherboard using AGESA 1.2.0.6b / Samsung 980 Pro 1TB C: & 2TB D: attached via M.2 slots providing 4 lanes of PCIe 4 to each drive.
The CPU is not overclocked - the RAM's DOCP configuration and the video card's factory overclock are the only overclocks on the system.
What am I missing here? I can't imagine there is any kind of hard limit on IOPS coded into PrimoCache (in fact one thread I read even had a statement from support clearly saying that there was no limit), so I'm convinced I'm missing some crucial piece of information that explains the results I'm seeing. Would someone be willing to set me straight here?
:Edited for clarification: