Hello everyone.
Just wondering if there is a guideline (RATIO) to select a SSD size for Level-2 caching vs the HDD size it will be caching ?
On my computer, I use a 240GB SSD to cache a 2TB HDD. Is my SDD too small? too big ? Or the correct size ?
I did not find this information in the documentation.
Thank you in advance for your responses.
PS. Being using PrimoCache for over 1 year and it works great.
SSD Level-2 Cache Size Topic is solved
Re: SSD Level-2 Cache Size
This is really subjective for most people. I have a personal preference though: no less than 10% data coverage, and ideal would be 20% or higher. So if your HDD has 1TB of data you want to cache on it, your 240GB L2 (over-provisioned so it only has a 200-220 GB volume for the L2, right?) would be right around 20%.
When you go too far under 20%, your hitrate starts to suffer due to blocks being swapped in/out too much. Just my opinion on it, others may have their own. 10% would be my absolute minimum coverage rate, or the L2 SSD is really going to suffer write/re-write wear.
When you go too far under 20%, your hitrate starts to suffer due to blocks being swapped in/out too much. Just my opinion on it, others may have their own. 10% would be my absolute minimum coverage rate, or the L2 SSD is really going to suffer write/re-write wear.
-
- Level 9
- Posts: 184
- Joined: Thu Feb 03, 2011 3:22 pm
Re: SSD Level-2 Cache Size
If you have a rough estimate on how much data you touch per day, like newly written data, old read data, then you have an indication of how much you would need to run via SSD without necessarily touching the HDD at all. I tend to use twice the size of that and that keeps even a server cool...
You can actually measure the data touched by looking at the statistics in PrimoCache, which is excessively helpful. You can reset them at the beginning of your measurement period and check later how much data was written and how much was read. Add that up, times 2 and you should be running really smooth from my experience.
You can actually measure the data touched by looking at the statistics in PrimoCache, which is excessively helpful. You can reset them at the beginning of your measurement period and check later how much data was written and how much was read. Add that up, times 2 and you should be running really smooth from my experience.
Re: SSD Level-2 Cache Size
Somebody did recommend to use a cache size that is twice the RAM size. But, why not use a much bigger cache and long latency, if you can afford it and you have got power supply that is as fail-safe as an UPS?
Re: SSD Level-2 Cache Size
Data usage on a hard follows the Pareto Rule (80/20). Despite us all having huge amounts of data and files on our drives, we actually use and re-use a very small fraction of that data on a daily basis. We use this principle in our defrag software UltimateDefrag. In fact, the software is specifically designed around this principle because it puts the most frequently used files in the hot zones of the hard drive (outer tracks) and it gives options to only defragment these frequently used files. User tip: You do not need to defragment your entire hard drive. Save wear and tear and time and only defragment and optimally place your most frequently used files.
An example of seeing this file use following the 80/20 rule with PrimoCache is that I reset my L2 cache 3 days ago and caching 2 hard drives total 4 Tb (2 Tb used space) I have only cached 24 Gb of that 2 Tb and I am in front of that PC as my main workhorse for 18 hours a day. My L2 cache is 200 Gb.
So my L2 is set at around 5% of my overall data. Even at that setting, it may be 1 to 2 weeks before my L2 fills up and PrimoCache starts evicting data. Even then, it is going to keep all of my hot files cached and only evict the least used data, and I will rarely notice if it has to read them off the spinner again. I am okay with 2 to 3% of my overall accesses being off the spinner.
If you apply this simple equation: % of time reading from spinner = Total Read - Cached Read - this will be the amount of time read from the spinner. Even with only 24 Gb of 2 Tb L2 cached (and currently only 2 Gb L1 cache) I am hitting the cache 94.42% of the time.
You would rarely need an L2 more than 10% of your total data. It will take a long time to fill up at that value (also depending upon your backup routine). So just stick with nice round numbers. 5%, 10% or 15% of your total data set size, but realistically 10% is more than enough.
An example of seeing this file use following the 80/20 rule with PrimoCache is that I reset my L2 cache 3 days ago and caching 2 hard drives total 4 Tb (2 Tb used space) I have only cached 24 Gb of that 2 Tb and I am in front of that PC as my main workhorse for 18 hours a day. My L2 cache is 200 Gb.
So my L2 is set at around 5% of my overall data. Even at that setting, it may be 1 to 2 weeks before my L2 fills up and PrimoCache starts evicting data. Even then, it is going to keep all of my hot files cached and only evict the least used data, and I will rarely notice if it has to read them off the spinner again. I am okay with 2 to 3% of my overall accesses being off the spinner.
If you apply this simple equation: % of time reading from spinner = Total Read - Cached Read - this will be the amount of time read from the spinner. Even with only 24 Gb of 2 Tb L2 cached (and currently only 2 Gb L1 cache) I am hitting the cache 94.42% of the time.
You would rarely need an L2 more than 10% of your total data. It will take a long time to fill up at that value (also depending upon your backup routine). So just stick with nice round numbers. 5%, 10% or 15% of your total data set size, but realistically 10% is more than enough.
-
- Level 9
- Posts: 184
- Joined: Thu Feb 03, 2011 3:22 pm
Re: SSD Level-2 Cache Size
I don't who and why one would recommend twice the size of RAM as cache. Thats IMHO nonsense. What you have to take care of is the block size of your drives, the block size of your cache and how much the cache index takes up RAM.
Example:
If you have formatted 512 Byte blocks on your HDD and use 512 Bytes cache blocks, you would need at least 8 times more RAM for the index than using 4 KByte blocks for HDD formatting and caching. In fact, I use 64 KByte blocks for formatting and cache, resulting in a 16 times lower RAM usage for the index than with 4KByte blocks or 128 times lower RAM usuage for the index than wit 512 Byte blocks.
Using cache blocks that are bigger than the HDD blocks also makes no sense. At best, both match 1:1. It is worth reformatting if needed.
However, choosing the right formatting block size depends strongly on your overall drive size as well as on the specific kind of data files you are going to store on them.
Example:
If you work with mainly large video, sound and image files, using a 64 KByte block size will benefit you with faster reads, less I/O overhead. This is really where PrimoCache shines.
Vice versa, if you only work with small 1-10 KByte text files, like few page word documents, you might loose a lot of available disk space, due to 64 KByte blocks being excessively larger than required. A compromise might be to use file system compression in that case, but I would assume that would affect performance, but never really tested it. After all, working with many small files doesn't sound like needing those on a fast, NVME-SSD cached device at all, except you are going to process millions of those in minutes.
In my old company we use PrimoCache Server to cache a 64 TByte volume using a 2 TByte cache for video editing etc. via 10 GBit Ethernet links. The cache was a partition on a 4 drive SATA SSD, as the server had no NVME ports yet. The server was connected with 40 GBit to the switch and a whole render farm was able to read files almost full speed from the server. Especially in a render farm scenario, where hundrets of CPU cores access the very same source files, a PrimoCache enhanced server will shine in its full potential. Cached files fly in from network with almost no performance hit, instantly. Even our RAID run beyond 2 GByte/s, but the SSDs where simply faster due to access times, which are magnitudes better on SSD than on HDD/RAID controllers.
I can only point you to my previous recommendation (see a few messages above) to measure how much data your really access in a single day, double that and you should be fine for cache size. If you have free RAM that you can spend for an L1 RAM cache, fine, but important is the L2 cache size in relation to how much data is being accessed during a single day. Measuring this is - after all - poossible using PrimoCache itself as explained earlier.
Using a long latency is fine, but has some risk. If you have a UPS and double power supply in your machine, go for it. If not, reconsider. For a read cache: No problem at all. For a write cache: Beware! However, it is important that data, that write data, that has been flushed to disk is not immediately removed from the cache. Its just that you set maximum time period to collect data and then write it off to SSD at once, to reduce wear and tear on the cache drives.
Re: SSD Level-2 Cache Size
I assume that cache size depends NOT on a size of the main drive but on the size of the applications u usually run.
If you toss Cyberpunk/Diablo III/RDR2/CivVI it's better to have larger drive.
If it's office work - 10GB will be enough I would assume.
PS
My 2017 iMac with 2TB/128GB Fusion (Hybrid) Drive worx like charm.
I even forgot it has hybrid, not SSD.
If you toss Cyberpunk/Diablo III/RDR2/CivVI it's better to have larger drive.
If it's office work - 10GB will be enough I would assume.
PS
My 2017 iMac with 2TB/128GB Fusion (Hybrid) Drive worx like charm.
I even forgot it has hybrid, not SSD.
-
- Level 9
- Posts: 184
- Joined: Thu Feb 03, 2011 3:22 pm
Re: SSD Level-2 Cache Size
Yes, that is correct.
The best indicator in my opinion is to identify how much data is accessed in a given time period. To me and for a working machine that means how much data is being accessed per working day. The interface of PrimoCache gives here valueable insights, as you can see how much data was accessed and how the summarized caching has saved from repeatedly accessing the very same data over and over. There a many scenarios in which data is being accessed very often, which is the best use case for caching. That could be game play, but also 3D rendering (think of textures, models), video editing (repeatably reviewing the very same sections), or even AI training by accessing the very same training data over and over in an iterative process.
After all, the amount to look for is the real amount of data accessed, not the summarized repeated access. So if a file is accessed, you only need it once, not the 100 times it was being accessed. At the end of the day you have an inidication that tells you: You need 10, 100, 1000 GByte of accessed data in a single day. Do that for a period of time, like a week, a month, to get an average and a maxium. For me, a good choice was to simply double the average - which was lower than the peaks. That gives me "working at SSD speed" for almost all the time using the systems at reasonable cost.
Secondly, and I can only iterate once again, make use of defragmentation on the HDDs behind. SSD Caching will actually benefit that process dramatically. If you do regular defragmentation, it will keep your HDD access very fast (sequential data) and secure in case of a failure, unwanted formatting etc. scenarios.
To extend on this even further, you may consider using tools like Undelete Server, which actually protects a HDD from getting "fragmenation holes" in the media / partition table by deleting files. In fact, it just moves the file entry into another folder and you decide forcefully, when to actually delete the file and give up free disk space. If you follow up with immediate defragmentation, its the best way to go.
Doing so will result in HDDs with almost only sequential, fast and reliably readable data, thus performing at optimum speed when pulling data from HDD into the cache and vice versa when writing to the HDD and not needing to fill holes and jumping around the media.
Cheers
Axel