GpuRamDrive: GPU VRAM as RAMdisk: Game changer!? Topic is solved

Suggestions around PrimoCache
Logic
Level 5
Level 5
Posts: 47
Joined: Mon Oct 29, 2018 2:12 pm

Re: GpuRamDrive: GPU VRAM as RAMdisk: Game changer!?

Post by Logic »

cichy45 & InquiringMind

You guys are missing the point! 🙂
Let me explain:

Windows will auto use any spare DRAM as a file aware read cache, regardless of whether you have PrimoCache installed or not.
(That's assuming your system drive is a HDD, or that you have edited the registry to force the said read caching)
ie:
~all your DRAM is used (dynamically) anyway.
What is NOT used as extra/spare RAM is any spare GPU VRAM...

So, with GpuRamDrive you can add a lower (slower) tier of disk caching, to further speed up I/O, in much the same way PrimoCache normally uses a SSD...
but no SSD reqd = wider customer base for Romex...

Ideally, said cache would also be dynamic, decreasing as the GPUs VRAM is required for graphics.
InquiringMind
Level SS
Level SS
Posts: 477
Joined: Wed Oct 06, 2010 11:10 pm

Re: GpuRamDrive: GPU VRAM as RAMdisk: Game changer!?

Post by InquiringMind »

Logic wrote: Wed Jun 09, 2021 11:21 am...Windows will auto use any spare DRAM as a file aware read cache, regardless of whether you have PrimoCache installed or not...~all your DRAM is used (dynamically) anyway.
What is NOT used as extra/spare RAM is any spare GPU VRAM...
This is actually a bad thing because:
  1. it results in multiple levels of caching which then can mean the same data being stored multiple times and multiple checks to determine if a particular item has been cached or not;
  2. Windows' file cache takes primacy over PrimoCache - initially both will start empty and fill up with identical copies of disk data. However since Windows' file cache gets checked first, the disk reads it satisfies will not be seen by PrimoCache and PrimoCache will flush its copy of this data out, resulting (eventually) in Windows' cache holding the most frequently requested data and PrimoCache the next most frequent. That is, until an application needs a large amount of memory which is then taken from Windows' file cache - this then results in the most frequently accessed data no longer being cached until it is read again and stored by PrimoCache.
If there was a way to disable Windows' own file caching, then it would be of measurable benefit to Romex to use it but unfortunately it is integrated into the NTFS filesystem and extremely non-trivial to bypass.
Logic wrote: Wed Jun 09, 2021 11:21 am...So, with GpuRamDrive you can add a lower (slower) tier of disk caching, to further speed up I/O, in much the same way PrimoCache normally uses a SSD...
This would add a third level of caching (with the results above) but with the added complications of a slower speed (so PrimoCache would have to figure out which cached data was less important in order to relegate it to GPU RAM), less capacity and, if a static size, potentially crippling performance for games which, if out of GPU RAM, would try to use system RAM instead. Since it is only gamers that are likely to *have* GPUs with large amounts of RAM, the loss of performance here is likely to be far more significant than any gain on disk speeds.

Comparing this feature with PrimoCache's L2 caching is also missing the point. L2's plus point is that it is limited by SSD size rather than RAM size so can be an order of magnitude larger, and since it is non-volatile it poses less risk of data loss than L1.
Logic wrote: Wed Jun 09, 2021 11:21 amIdeally, said cache would also be dynamic, decreasing as the GPUs VRAM is required for graphics.
That would avoid the game-crippling performance issue, but then you'd have poorer cache performance since a dynamic cache would be harder to index/search than a static one and cached data would be lost after running any GPU-intensive game.
JVini0166
Level 3
Level 3
Posts: 15
Joined: Wed Sep 30, 2020 2:36 am

Re: GpuRamDrive: GPU VRAM as RAMdisk: Game changer!?

Post by JVini0166 »

Would be really cool! Nice idea I've used GpuRamDisk, I have a 4GB Graphics Card where I use only 1-2GB max, the others 2GB would be cool if used as Cache, companies would use it to speed up buying 8GB/16GB cards to use in DDR2/DDR3 servers that has max amount of memory and it need to be used by the other softwares and OS
Logic
Level 5
Level 5
Posts: 47
Joined: Mon Oct 29, 2018 2:12 pm

Re: GpuRamDrive: GPU VRAM as RAMdisk: Game changer!?

Post by Logic »

InquiringMind wrote: Fri Jun 04, 2021 6:19 pm I'm in agreement with Cichy45 here. GPU VRAM does offer greater bandwidth than system RAM, but only for GPU-specific work (graphics rendering, GPUGPU, etc). Using GPU RAM more generally (as a ramdisk or disk cache) leads to it being restricted by PCIe bandwidth, which is far less than that of system RAM.

In addition, GPU RAM tends to be more expensive and it is only comparatively recently that graphics cards have shipped with large quantities (>2GB). Even then, the amount included is fairly small compared to most motherboard's RAM capacity.

So this would represent a more expensive, slower and more limited option. A waste of time to implement I would suggest.
I don't give a flying-uck if it's slower than DRAM InquiringMind! So is my SSD, my HDD, Flash drive/s.
But ALL of them are slower than GpuRamDrive.
Especially in the low que depth, random 4K dept that makes up 66% of Windblows I/O...

My RAM is full and busy elsewhere, while my GPU and it's RAM sit idle, doing F-all..! Get it???
Not only that, I have a spare GPU, gathering dust, that could be used.

Seems to me you're so keen on a good argument that logical, deductive thinking escape you when you eagerly start banging your 'superior intellect' into a keyboard!?
This may be difficult for you to grasp, but it's OK for people besides you to have a good idea... really it is!
User avatar
Jaga
Contributor
Contributor
Posts: 692
Joined: Sat Jan 25, 2014 1:11 am

Re: GpuRamDrive: GPU VRAM as RAMdisk: Game changer!?

Post by Jaga »

It *may* be a good idea, but I think InquiringMind mentioned it should be for a separate product which I would agree with, or at best an add-on product. And the market segment for it would be much smaller than what Primocache currently has, so the idea of developing it as a paid product (for a demographic that typically doesn't like to pay high prices for utility software) doesn't seem to be a very attractive one.

Even if produced as a recurring subscription add-on to pay for development, it would only be useful for gamers. Rendering uses large amounts of drive space, and the GPU is already in use doing other tasks, so digital artists probably wouldn't benefit. When gaming, vRAM is more in-use which limits the cache space you could use. The only utility you might see out of it is as a Gamer, when not gaming, which sorta goes against the "useful for a demographic" idea.

Theoretically it's good on paper, realistically it's a mess. No one's going to go purchase a 30XX or equivalent card just to utilize it's vRAM, when half the cost could be invested in more system RAM (or RAM + Motherboard for still less than the video card). I currently own a 2080ti with 11GB of vRAM, and I can pretty confidently say I wouldn't enable a vRAM cache, since a good chunk of the vRAM is in use while I game. When I'm not gaming, I wouldn't really need an additional ~6 to 8 GB of cache space, due to having greater than 32GB RAM in the system (64 right now). Primocache on a correctly built/configured system is more than effective enough right now.

The fallacy behind thinking the product is needed, is that gaming enthusiasts spend almost as much on their video card as they do on the rest of their system, which produces a poor build most of the time. Some people still try to get away with 8GB of RAM, which I just have to facepalm at. ~12 years ago I started recommending to friends that they get no less than 16GB, and ~7 years ago I recommended no less than 32GB. Right now for enthusiast gamers I recommend 64GB or more, and that they try to source slightly used video cards (due to mining driving up the prices). Part of that recommendation is for the use of Primocache, and the longevity of the build.

Bottom line for me: despite being an avid optimization nut, I still don't think the product would be profitable, nor do I think the market for it would be significant. And for gamers who actively use their vRAM, it sorta goes contrary to how to build a system effectively. i.e. buying a 30XX video card and then putting ~16 GB of RAM into the motherboard is absolutely a mis-configured system.
InquiringMind
Level SS
Level SS
Posts: 477
Joined: Wed Oct 06, 2010 11:10 pm

Re: GpuRamDrive: GPU VRAM as RAMdisk: Game changer!?

Post by InquiringMind »

Logic wrote: Sun Aug 08, 2021 12:09 pm ...My RAM is full and busy elsewhere, while my GPU and it's RAM sit idle, doing F-all..! Get it???
Not only that, I have a spare GPU, gathering dust, that could be used...
Then accept that you have over-specified/over-spent on your GPU, sell it and replace it with one that only has the VRAM you need, and use the money raised to get more motherboard RAM.

And sell that spare GPU also, before prices return to normality.
Jaga wrote: Mon Aug 09, 2021 1:50 am Theoretically it's good on paper, realistically it's a mess. No one's going to go purchase a 30XX or equivalent card just to utilize it's vRAM, when half the cost could be invested in more system RAM (or RAM + Motherboard for still less than the video card). I currently own a 2080ti with 11GB of vRAM, and I can pretty confidently say I wouldn't enable a vRAM cache, since a good chunk of the vRAM is in use while I game...
This reinforces the point made above - the only people with large amounts of VRAM will be gaming enthusiasts or GPGPU users, both of whom will want/need that VRAM for gaming/mining.

If VRAM dropped in price enough to allow for GPUs with more RAM than what motherboards can currently accommodate (128-256GB at the moment) then this could change matters (given the current price for an Nvidia 3080Ti it *ought* to come with that much memory...) but it's more likely that by then we'll be looking at motherboard capacity in TBs.
vlbastos
Level 4
Level 4
Posts: 21
Joined: Sat Jul 16, 2022 10:32 pm

Re: GpuRamDrive: GPU VRAM as RAMdisk: Game changer!?

Post by vlbastos »

Geforce RTX3070 Mobile, 8GB of 256bit GDDR6 (448 GB/s bandwidth) laying around doing nothing, why not test it?

Bad performance, but a good idea. For a proof-of-concept software, it's quite amazing. Imagine how it would perform with proper optimizations. By the way, I also tested building a ramdisk in the normal system RAM, and the performance was almost as bad as with VRAM, which means ImDisk Virtual Disk Driver is not a good ramdisk driver. It also means GpuRamDrive isn't the only one to blame for this bad performance.

Also, using VRAM as the cache means you can NOT USE system RAM for PrimoCache, and the more available system RAM, the better. People with less RAM but lots of unused VRAM could have the best of both worlds. Also, you could make some kind of administration to switch VRAM cache on and off if the GPU starts using more VRAM. Or turn the VRAM cache off (or on) by an executables list.

You guys are missing 2 important technologies available today that would definitely help using VRAM as a cache: Resizable BAR and DirectStorage.

1. Resizable BAR:
https://www.rockpapershotgun.com/what-i ... you-use-it
https://docs.microsoft.com/en-us/window ... ar-support
https://en.wikipedia.org/wiki/PCI_configuration_space

2. DirectStorage:
https://devblogs.microsoft.com/directx/ ... ing-to-pc/
https://devblogs.microsoft.com/directx/ ... ble-on-pc/

2.1. Nvidia RTX IO (Nvidia's DirectStorage implementation):
https://techreport.com/news/3473104/wha ... ia-rtx-io/

2.2. AMD Smart Access Storage (AMD's DirectStorage implementation):
https://www.digitaltrends.com/computing ... s-storage/

And if you compare the raw speeds of the PCI Express 4.0 x16 link against DDR4-3200 (my laptop uses both), PCI Express has the upper hand:

DDR4 3200: 25600 MB/s (https://en.wikipedia.org/wiki/DDR4_SDRAM)
PCI Express 4.0 x16: 31.5 GB/s (https://en.wikipedia.org/wiki/PCI_Express)

So I think it's worth to give it a go, using these newest technologies.

Here are my results with a 1280MB GpuRamDrive:
------------------------------------------------------------------------------
CrystalDiskMark 8.0.4 x64 (C) 2007-2021 hiyohiyo
Crystal Dew World: https://crystalmark.info/
------------------------------------------------------------------------------
* MB/s = 1,000,000 bytes/s [SATA/600 = 600,000,000 bytes/s]
* KB = 1000 bytes, KiB = 1024 bytes

[Read]
SEQ 1MiB (Q= 8, T= 1): 2150.918 MB/s [ 2051.3 IOPS] < 3758.15 us>
SEQ 1MiB (Q= 1, T= 1): 1997.493 MB/s [ 1905.0 IOPS] < 524.66 us>
RND 4KiB (Q= 32, T= 1): 267.776 MB/s [ 65375.0 IOPS] < 473.60 us>
RND 4KiB (Q= 1, T= 1): 129.255 MB/s [ 31556.4 IOPS] < 31.59 us>

[Write]
SEQ 1MiB (Q= 8, T= 1): 3145.389 MB/s [ 2999.7 IOPS] < 2448.16 us>
SEQ 1MiB (Q= 1, T= 1): 2904.572 MB/s [ 2770.0 IOPS] < 360.73 us>
RND 4KiB (Q= 32, T= 1): 382.770 MB/s [ 93449.7 IOPS] < 331.38 us>
RND 4KiB (Q= 1, T= 1): 152.177 MB/s [ 37152.6 IOPS] < 26.80 us>

[Mix] Read 70%/Write 30%
SEQ 1MiB (Q= 8, T= 1): 2218.394 MB/s [ 2115.6 IOPS] < 3757.53 us>
SEQ 1MiB (Q= 1, T= 1): 2121.866 MB/s [ 2023.6 IOPS] < 493.77 us>
RND 4KiB (Q= 32, T= 1): 279.845 MB/s [ 68321.5 IOPS] < 453.09 us>
RND 4KiB (Q= 1, T= 1): 135.992 MB/s [ 33201.2 IOPS] < 30.01 us>

Profile: Default
Test: 1 GiB (x5) [R: 0% (0/1280MiB)]
Mode: [Admin]
Time: Measure 5 sec / Interval 5 sec
Date: 2022/07/16 17:35:43
OS: Windows 11 [10.0 Build 22000] (x64)
vlbastos
Level 4
Level 4
Posts: 21
Joined: Sat Jul 16, 2022 10:32 pm

Re: GpuRamDrive: GPU VRAM as RAMdisk: Game changer!?

Post by vlbastos »

Now here's a wild idea: suppose you can get a good VRAM cache performance that is about 60% of the L1 performance. Now suppose you can mirror the L1 to VRAM, making a kind of a RAID1 cache with L1+VRAM. Now you can read both at the same time, at VRAM speed, getting 120% of that original L1 performance. Doesn't sound bad at all.

Now imagine you can optimize the VRAM cache to the point of getting 80-90% L1 performance. And you make a mirror. And your read speeds are 160-180% of the single L1 cache. What about that?

You could even make a new tier to the overall cache: the L1+VRAM tier on top (180% of system ram performance, up to the VRAM size, if you won't use the VRAM), L1 below it (a lot more GBs than VRAM), then L2. Sounds cool.

Edit: getting even wilder: L1+VRAM RAID-0 cache on top, L1 below it. Squeezing all the I/O throughput you can get from both system RAM and PCI Express x16 at the same time, reading or writing.
vlbastos
Level 4
Level 4
Posts: 21
Joined: Sat Jul 16, 2022 10:32 pm

Re: GpuRamDrive: GPU VRAM as RAMdisk: Game changer!?

Post by vlbastos »

And here is a very informative walkthrough from Nvidia Developer channel on CPU accessing VRAM through PCIe bus:
https://developer.nvidia.com/blog/optim ... ible-vram/

"...effectively use a CPU thread as a copy engine. This can be achieved by creating the DX12 UPLOAD heap in CVV by using NVAPI. CPU writes to this special UPLOAD heap are then forwarded directly to VRAM, over the PCIe bus (Figure 3)."

Image
Figure 3. Preloading a VB to VRAM using CPU writes in a CPU thread

"For DX12, the following NVAPI functions are available for querying the amount of CVV available in the system, and for allocating heaps of this new flavor (CPU-writable VRAM, with fast CPU writes and slow CPU reads):

NvAPI_D3D12_QueryCpuVisibleVidmem
NvAPI_D3D12_CreateCommittedResource
NvAPI_D3D12_CreateHeap2
These new functions require recent drivers: 466.11 or later."
vlbastos
Level 4
Level 4
Posts: 21
Joined: Sat Jul 16, 2022 10:32 pm

Re: GpuRamDrive: GPU VRAM as RAMdisk: Game changer!?

Post by vlbastos »

And here is a 9 year old article on CUDA Unified Memory and Unified Virtual Adressing:
https://developer.nvidia.com/blog/unifi ... in-cuda-6/

And here are a few articles on CUDA Unified Memory for beginners, and maximizing Unified Memory performance in CUDA:
https://developer.nvidia.com/blog/unifi ... beginners/
https://developer.nvidia.com/blog/maxim ... ance-cuda/

Image
"Performance Through Data Locality
By migrating data on demand between the CPU and GPU, Unified Memory can offer the performance of local data on the GPU, while providing the ease of use of globally shared data. The complexity of this functionality is kept under the covers of the CUDA driver and runtime, ensuring that application code is simpler to write. The point of migration is to achieve full bandwidth from each processor; the 250 GB/s of GDDR5 memory is vital to feeding the compute throughput of a Kepler GPU."


We are at CUDA 11.7.99 nowadays, by the way.
Post Reply