L2 and cached HDD block size mismatch - URGENT

Axel Mertes · Post by **Axel Mertes** » Tue May 19, 2015 4:06 pm

Hi Support,

I have installed now two Samsung 850 EVOs as cache using an Adaptec 6445 RAID controller in RAID0 mode. I see nearly 1 TByte of SSD Cache in size being about 1 GByte/s fast in read and write.

The FC RAID6 I am caching is 11.5 TByte in size and has 4 KB block size. Its connected via 4 GBit FC, so its peak performance is around 350 MByte/s due to the interface.

When setting up the SSD L2 Cache, the only provided options for setting the L2 cache block size are 16 KB, 32 KB, 64 KB, 128 KB, 256 KB and 512 KB.
Unfortunately I have not yet installed more RAM, so physically I can't even address the entire L2 Cache using 12 GByte RAM at all. I will swap to 48 GByte of RAM soon, however, Windows 2008 Server does allow me to use only 32 GByte. It'll be interesting to see if the hidden 16 GByte can be utilized.

If I run the cache using 32 KByte blocks I can at least use the entire L2 cache size given the current RAM. However, the block size of the L2 cache is then 32 KB while the RAID that the cache belongs to has a block size of just 4 KB.

Does that mean for every 4 KB block from HDD RAID a cache block of 32 KB is wasted on SSD?
Wouldn't that in turn mean unnecessary wear on the SSD and a waste of 32-4 = 28 KB per block or 7/8th of the entire SSD cache?
Wouldn't that mean that a true SSD cache with true 4KB blocks and 128 MByte has the exact same capacity at the end of the day?

Please clarify, as this isn't really transparent explained in help.

Would you recommend to reformat the HDD RAID6 using e.g. 256 KByte blocks to minimize memory overhead?

I am currently running a program to analyze the size of all files on our RAIDs. This way I hope to find out the impact in storage capacity/waste when switching from a smaller 4KB block size to e.g. 64 KB or even 256 KB block size in future. We deal a lot with larger files and image sequences in HD and 4K, so that might work out in the end.

Any recommendation and clarification highly appreciated.

Kind regards,
Axel

AlienTech · Post by **AlienTech** » Wed May 20, 2015 2:19 am

That's the same question I had but from testing I don't think any space is wasted, the larger block size would fill with extra sequential clusters which although are not used right away might be used in the future.. Since I was getting 80% cache hit rate with 4x cache storage on 1x read.. Ofcourse I have no control over the read process and if it is reading the same data over and over, but if not I should have seen 8x cache L2 usage since the block size is 8x the cluster size..

For small random access I think smaller block size is the way to go.. as sequential clusters wont hold relevant data.. But for larger data access I think there is more chance of that data already being in the cache from previous read operations.

Something else I noticed is, initially I would copy a file to a temp directory so that data got in the cache. But after a while copying a file wont increase L2 cache size. it shows total read size increasing AND Cached read increasing a little bit which I assume is partial data already being in the cache, but the rest of the file does not get stored in the L2 cache. Reading or copying the file multiple times makes no difference. That's a bug or some logic that over rides the initial one. The program seems to have multiple logic process for cache storage. I can see this coming in useful if like at 80% full but not for 10%..

Axel Mertes · Post by **Axel Mertes** » Thu May 21, 2015 2:32 pm

AlienTech wrote:That's the same question I had but from testing I don't think any space is wasted, the larger block size would fill with extra sequential clusters which although are not used right away might be used in the future.. Since I was getting 80% cache hit rate with 4x cache storage on 1x read.. Ofcourse I have no control over the read process and if it is reading the same data over and over, but if not I should have seen 8x cache L2 usage since the block size is 8x the cluster size..

I really doubt thats true. If it were true, the amount of memory required to index my cache should be related to the number of *source* blocks mirrored into the SSD cache blocks. But when I increase the block size of the SSD cache to e.g. 512 KB, the memory overhead gets extremely small - too small to contain an index for 1 TByte of cached 4 KB sized *source* blocks.

Lets do some math:
1 TByte / 16 KByte blocks = ~61,035,156 blocks
So I think from former calculations that about 80 bytes are needed to index a block. So I'd need 61,035,156 * 80 Bytes = 4,882,812,500 Bytes for indexing ~ 1TByte of SSD cache @ 16KB block size alone.

I'de need four times as much to index 4 KB source blocks, thats nearly 20 GBytes for the index alone. More than I have RAM.

Therefor I assume that there is a 1:1 relation between source blocks and cached blocks. If the latter are bigger then the source blocks, you just waste space.

It would be extremely helpful if support or programmer could chime in here and clarify.

I wonder why the PrimoCache does not allow me to use a block size smaller than 16 KByte at all for the L2 SSD RAID0 cache I've set up. Will that change when I add more RAM, so I could essentially index 4 KByte blocks in RAM (20+ GBytes free, say 32 or 48 GBytes RAM)?

Point is, I don't want to buy RAM just for finding out it does not work.

InquiringMind · Post by **InquiringMind** » Sun May 24, 2015 1:27 pm

Axel Mertes wrote:I really doubt thats true. If it were true, the amount of memory required to index my cache should be related to the number of *source* blocks mirrored into the SSD cache blocks...

Not really - PrimoCache only needs to index its own blocks so the overhead should be related to the number of PrimoCache blocks (so a bigger size means fewer blocks and a smaller index).

When a read/write request is made of the source media (for a specific sector/cluster number), PrimoCache needs to check if that sector has been cached first. But when PrimoCache's blocks are larger, it is simple enough (an arithmetic modulo operation) to check whether that sector would lie in a PrimoCache block.

For example, let's say we're using 64KB PrimoCache blocks on a 4KB-cluster-sized hard disk. Each PC block will then hold 8 clusters. PC then reads and stores cluster 4002 - depending on implementation, it might (a) read/store clusters 4000-4008 (which would be best performance-wise) or (b) read/store 4002-4010.

Now a read request is made for cluster 4006. PC knows it has 8 clusters per block so either (a) it calculates its block's starting cluster number (4000) to determine whether 4006 is cached or not - which in this case would be yes or (b) checks its index for the nearest matching cached cluster (4002) and calculates whether it is "near enough" the requested cluster (4006) which again would be yes.

Either approach would work, but (b) would make index searches slower so I suspect (and would be interested to hear support confirm/deny) that PrimoCache aligns its read/writes on blocksize boundaries.

Axel Mertes · Post by **Axel Mertes** » Tue May 26, 2015 11:07 am

My math was not about holding an index for every source block BUT for every SSD cache block. The amount of RAM needed to index only the SSD cache blocks is larger than my physical RAM, so this is why I came to this conclusion.

OK, they might read enough source blocks in a row to fill the cache and given on sequential block is needed (which could well be the case in a defragmented scenario) then this would somehow work as you describe.
However, it would still be fairly inefficient.

After all, a miracle would be if someone would implement a true file based cache at some point. Think of what this could do in networking...

So I stand with my original interpretation and still believe that RAM is limiting all here.

Btw, I wrote a small piece of software to analyze how many files we have on a fairly large volume and which sizes they have. From this I calculate how much disk space is wasted when using a different block size.
Right now the RAID itself is using 4K blocks - the largest its controllers support. On disk formatting side I may choose to use 64 KByte blocks as it would mean a decrease from 99.9914% usage to 99.1329% usage and in fact 100 GBytes lost when formely only 1 GByte was lost in space. However, if we can then utilized more SSD cache its a good deal. 100 GBytes is very small given a 26 TByte partition...

InquiringMind · Post by **InquiringMind** » Fri May 29, 2015 11:34 am

Axel Mertes wrote:My math was not about holding an index for every source block BUT for every SSD cache block...

Whoops - missed the key point that you were using L2 caching. My bad. *blush* Not used L2 myself so can't offer any useful comment.

Romex Software Forum

L2 and cached HDD block size mismatch - URGENT

L2 and cached HDD block size mismatch - URGENT

Re: L2 and cached HDD block size mismatch - URGENT

Re: L2 and cached HDD block size mismatch - URGENT

Re: L2 and cached HDD block size mismatch - URGENT

Re: L2 and cached HDD block size mismatch - URGENT

Re: L2 and cached HDD block size mismatch - URGENT