Request: Separate L1 read and L2 write cache with safe deferred writes

Suggestions around PrimoCache
Post Reply
tverweij
Level 6
Level 6
Posts: 75
Joined: Thu May 10, 2018 9:27 am

Request: Separate L1 read and L2 write cache with safe deferred writes

Post by tverweij »

I'd like to have separate caches for read and write, where read runs in L1 only, and write in L2 only.

The read scenario
Data in the L1 cache and the L2 cache can both be used for cached reads, but the L1 cache is never flushed to L2.
Blocks read from disk are never stored in L2. When a block is read from L2, it will not be stored in L1 as it is already cached by L2.

The write scenario
Data is immediate written to L2 (never to L1), a write is only completed when it has been completely written to L2. The written data will stay available on the L2 until the space is needed for new writes (no more free space), where the oldest data is overwritten first, or when the blocks are deleted.
L2 will be deferred written to the under laying disk. When a blue screen or power loss happens, the unwritten data in L2 will be written to the disk during the boot sequence or as soon as the disk is connected (b.e iScsi).

This caching scenario makes sure that:
1. There is as much as possible space in the L1 cache for reads, and the cache won't be trashed by big writes.
2. All writes are completed to disk - never to memory, making sure that all writes are persistent, even before they are written to the destination disk.
3. All writes will be flushed to the under laying disk, even when a power loss or blue screen appears.
4. All written data is cached, making sure that backups (that only backsup changes) are always read-cached over L2, won't trash the L1 cache and won't use the underlying disks, making sure that the uncached reads on those drives are still fast.
5. A much higher write delay can be used without the chance of losing data, making sure that (in Hyper-V replication scenarios) more than 50% of the data will be trimmed and never be saved to the underlying disk, saving a lot of wear on those drives (and a lot of bandwidth in the case of iScsi drives)

Explanation
Using a very fast NVMe drive, you can speed up the writes without the need of wasting memory on it, and in this scenario all data is safe, even deferred writes. Keeping the read cache in L1 will make sure that the drive will always perform (no L1 to L2 flushes) and the complete size will be available for write back caching.
Last edited by tverweij on Thu Feb 08, 2024 2:10 pm, edited 1 time in total.
tverweij
Level 6
Level 6
Posts: 75
Joined: Thu May 10, 2018 9:27 am

Re: Request: Separate L1 read and L2 write cache with safe deferred writes

Post by tverweij »

As the "flush on boot" feature might be problematic on the boot disk, as the drive will be active before PrimoCache, it might be handy to not support this scenario on the boot disk - or only support L2 write-through for the boot disk, to prevent data loss.
tverweij
Level 6
Level 6
Posts: 75
Joined: Thu May 10, 2018 9:27 am

Re: Request: Separate L1 read and L2 write cache with safe deferred writes

Post by tverweij »

I saw there was already another thread about a safe L2; viewtopic.php?t=5574

As support answered there that it was not possible because of the amount of index updates, but I think that this is solved now with the PCIe-4 NVMe drives that perform as fast as memory (about 8 Gbyte/sec, in a raid0 or raid1: double that)
User avatar
Support
Support Team
Support Team
Posts: 3627
Joined: Sun Dec 21, 2008 2:42 am

Re: Request: Separate L1 read and L2 write cache with safe deferred writes

Post by Support »

Sorry for the late reply due to Chinese New Year Holiday.
tverweij wrote: Tue Feb 06, 2024 12:26 pm I'd like to have separate caches for read and write, where read runs in L1 only, and write in L2 only.
Both L1 and L2 supports separate cache spaces.
tverweij wrote: Thu Feb 08, 2024 2:07 pm As support answered there that it was not possible because of the amount of index updates, but I think that this is solved now with the PCIe-4 NVMe drives that perform as fast as memory (about 8 Gbyte/sec, in a raid0 or raid1: double that)
Thank you for the suggestion.
nabsltd
Level 1
Level 1
Posts: 1
Joined: Sat Mar 16, 2024 7:15 pm

Re: Request: Separate L1 read and L2 write cache with safe deferred writes

Post by nabsltd »

+1

I definitely feel that some sort of configuration that allowed direct writes to the L2 device and "when idle" copies of that data to the final destination would help in a lot of use cases. Anything where you might burst 10-20GB and then not really write much at all for a minute or so would be the ideal case.

Perhaps there could be some sort of hybrid system where the data is written directly to L2, the changes to indexes are written to an intent log in RAM and to the actual indexes, the RAM intent log is batched up (like deferred write works now) into multiples of 4K blocks and written to an intent log on L2, and then the full index table on L2 is updated when idle. Eventually, the data from L2 gets copied to the actual storage.

Having this sort of setup should come with warnings like you now show for deferred write...don't do this without a UPS, increased writes to the L2 could cause excessive wear, L2 should probably be redundant, etc.

Right now, I have 8x 4TB of spinning rust in hardware RAID6 (set to write-back with 1GB of battery-backed NV cache) that has 48GB of L1 (deferred write enabled) and 1TB of L2 (on a 3.2TB PCIe 3.0 NVMe with PLP and rated at 3 DWPD). The system has redundant power supplies, each plugged into a separate UPS. I don't fear losing data to 30 seconds or so of deferred write, nor do I worry about extra L2 writes.

The raw array gives me about 700MB/sec without PrimoCache, and as long as I burst less than 48GB, I get 4GB/sec on writes with PrimoCache. But, if I write more than 48GB quickly, I drop down to only about 1GB/sec. With a direct write to L2, I could cut my L1 down a lot, and still be OK, as it would burst at 4GB/sec, then drop to about 2GB/sec. PrimoCache would then be able to use the RAM I don't use for L1 for the extra overhead of the intent log.
Post Reply