ReFS vs. NTFS, Stripe vs. Simple Storage Space, Diskeeper vs. PrimoCache

FAQ, getting help, user experience about PrimoCache
Axel Mertes
Level 9
Level 9
Posts: 184
Joined: Thu Feb 03, 2011 3:22 pm

ReFS vs. NTFS, Stripe vs. Simple Storage Space, Diskeeper vs. PrimoCache

Post by Axel Mertes »

Hi All!

I am currently running extensive tests on a 16bay Dual 4 GBit RAID subsystem directly connected to a dual Hexacore Xeon server running Windows 2012 R2.

In attempt to design the best performance system for future server replacement in my company, I am currently investigating ways of connecting, controlling and using the storage and what results are shown. Beside many still not fully answered questions, there are a lot of findings I had, which I'd like to share here.

First, I made a long set of tests comparing how to set up the storage. I have therefor formatted the FC enclosure either as 2x8 drive JBOD (2 FC host connections to two server ports), 2x8 RAID0, 2x8 RAID1, 2x8 RAID5 and 2x8 RAID6.

For JBOD and RAID0 I tried Storage Spaces with Single or Dual parity, and Mirror modes with Dual and Triple copies. Clearly I also looked into Simple Storage Space, but only for performance check, as this makes no sense security wise, as there is NO chance to survive any accident then. Without going into detail I can summarize that the Storage Space redundancy features like Single or Dual Parity or Dual or Triple Mirror cost you a lot of disk space and performance. In fact, performance drops to as low as 1/4th possible otherwise, which is why I rule them ALL out at this point.

So the first result was to use the chassis RAID controller in RAID6 mode, as the trade off over RAID5 is minimal: Read speads are identical, write speeds are 1/10th less, but the redundancy is far better (2 drive failures per 8 drive set). However, total disk space left for parity is 25%. While the actual numbers are not THAT important, they have been helpful in decding which way to go in the sheer endless mixture of possibilities.

Given the RAID enclosure is being used in RAID6 mode, Windows does not really need to take care of it. The only risk is what would happen if a drive set fails, e.g. by power outage (unlikely, we have a 40 kwh 3-Phase UPS behind, everything has at least 2 PSUs) or (more likely) a broken cable/SFP link or lost controller. This kind of testing is not yet done, but I have reasonable experience with NTFS strips surviving such situations.

Comparing test results, it showed up that a Storage Space with a virtual partion in "Simple" mode appears to be reasonably faster in writing than a "standart" Windows disk stripe using dynamic disks. Dynamic disks has always felt a bit risky, with lots of possible problems underneath and few industry support. Read speeds appear to be close to each other. So storage space seem to sound like a good idea over dynamic disk striping.

Next point is to decide which file system is better: NTFS or ReFS.

I am still undecided on this one. Many of the information around ReFS make it sound a good decision, e.g. that its said to be "self-healing" and does not require CHKDSK anymore. Then there is the fact its scubbing data from one place to another and thereby essential defragmenting the volume more or less automatically, over time. Plus, its increase file/name length among some other important improvements. However, there are some draw backs in using ReFS too:

- There is still few industry support, such as Condusiv Undelete server not supporting ReFS yet (no response yet if it ever will, and when). Undelete server can save your butt in case someone deletes a wrong file or folder via a SMB share.
- ReFS is new. We don't know if there are still bugs in there. Some users apparently had really bad experience in the past and I could not even tell if their problems have been resolved in the meantime.

So some conservative thoughts speak pro NTFS!

What about speed, is there any difference?
After a long set of benchmark tests, I see minimal to none difference in speed between NTFS and ReFS.
So, here some real numbers:


ATTO Disk Benchmark 3.05
Transfer size 4 KB to 64 MB, 256 MByte total length, Queue Depth 10
2x8 RAID6 NTFS Stripe ~511 MByte/s writes, ~777 MByte/s reads
2x8 RAID6 ReFS Stripe ~515 MByte/s writes, ~775 MByte/s reads
2x8 RAID6 NTFS Simple Storage Space ~653 MByte/s writes, ~811 MByte/s reads
2x8 RAID6 ReFS Simple Storage Space ~648 MByte/s writes, ~811 MByte/s reads

Transfer size 4 KB to 64 MB, 32 GByte total length, Queue Depth 10
2x8 RAID6 NTFS Stripe ~483 MByte/s writes, ~674 MByte/s reads
2x8 RAID6 ReFS Stripe ~485 MByte/s writes, ~668 MByte/s reads
2x8 RAID6 NTFS Simple Storage Space ~485 MByte/s writes, ~749 MByte/s reads
2x8 RAID6 ReFS Simple Storage Space ~496 MByte/s writes, ~749 MByte/s reads

So in the 256 MByte transfers we see a huge difference in speeds (like +140 MByte/s, +28%) pro Simple Storage Space over Stripe sets.
And in 32 GByte transfers we see a clear read performace advantage (like +75 MByte/s, +12%) pro Simple Storage Space over Stripe sets.


Given the minimal differences between NTFS and ReFS in terms of speed and taking into account the conservative arguments about NTFS being a proven file system with lots of third party support, I currently tend to stay with NTFS for a while, until at least the third party support is getting bigger. In that context its a pitty that Microsoft does not reveal all the details behind ReFS to the public, as this would certainly increase its support and reliability tremendously.

While this was all about TRUE HDD performance, we can add caching on top, such as PrimoCache or using a tiered storage model by having SSDs and HDDs at the same time in the same storage space.

In that context I found this interesting document about ReFS and Storage Spaces, indicating it highly relies on write back caching (1 GByte by default, isn't that to be called dangerous?) and how it moves around data from / to SSD storage tier:

https://blogs.technet.microsoft.com/lar ... r-2012-r2/

Another interesting document is this one, explaining step by step how to set up storage tiering:

https://blogs.technet.microsoft.com/ask ... r-2012-r2/

Tiered storage works with NTFS too. Didn't knew that before... and it helps considering NTFS - again.

While I pretty much like the idea of a tiered storage space, I am not sure how fast it reacts on moving "hot" and "cold" data between the two tiers. It may be not as transparent as PrimoCache. And I am also not sure how to enable single SSDs along with a true hardware RAID to become secure without loosing to much disk space. And remember, its still all new and we have few exerience with what happens in a desaster momemt with this.

So for the time being I tend to use PrimoCache as READ cache, preferably using SSD L2. Potentially I'll enable write caching again, if everyone else thinks thats safe: Windows does it (1 GByte default!!!). And Diskeeper does it too, using its "Invisitasking" called write caching. Plus, Diskeeper 2016 aparently added RAM READ caching - but no L2 cache like PrimoCache. But a tiered storage space alone or along with Diskeeper 2016 may be similar to PrimoCache right now. But at which price point?

Dedicated cache along with a secure "main HDD array" sounds much better and predictable to me than the tiered storage. The problem is that I can't set different resilency levels on the two storage tiers. I may need to solve this in hardware, such as using RAID1 mirroring on SSDs and RAID6 on HDDs and provide them as two drives to make tier from them.

I'll continue my tests and configuration "trials" with this system to see how to improve it. At least I was able to sort out some options for performance reasons. I will likely have new RAID solutions in place in the long run, this is experimental to design the best solution.

I hope this may be interesting for some of you.
User avatar
Jaga
Contributor
Contributor
Posts: 694
Joined: Sat Jan 25, 2014 1:11 am

Re: ReFS vs. NTFS, Stripe vs. Simple Storage Space, Diskeeper vs. PrimoCache

Post by Jaga »

I did a lot of reading on ReFS, and the impression I came away with is one that led me to trust the technology. Because of that, I gave two 1TB spindle drives to a StorageSpace volume formatted with ReFS. So far (~3 months later) it is performing flawlessly, and I don't have to worry about data corruption. NTFS has been around a long, long, long time in computer years, and while it is still a solid performer, it is time for a new format to replace it. I think ReFS is that format.

Of course, you can't make boot drives ReFS yet, so we're still limited moving forward. It will be interesting to see how MS approaches the rather large install base of W10 computers, and rolling out ReFS boot capabilities down the road.
Axel Mertes
Level 9
Level 9
Posts: 184
Joined: Thu Feb 03, 2011 3:22 pm

Re: ReFS vs. NTFS, Stripe vs. Simple Storage Space, Diskeeper vs. PrimoCache

Post by Axel Mertes »

Its less missing trust in ReFS than missing third party industry support that hinders me from jumping on ReFS. As explained above, Microsoft is hiding detailed information and seems to do so even among well known third party companies, which forces them to waste a lot of time on reverse engineering instead implementing new features. Why is Microsoft doing this?

In my tests, ReFS felt very stable, but NTFS too. I will do some further tests, where I pull a cable of an individual partition of those partitions making up the storage space, to see how well the system can handle that. I will therefor place ReFS and NTFS on the same storage space to see if there are differences. I guess it'll be not easy to create any "fault" that affects any of the file systems at all. However, I've had occasions where NTFS partitions called for CHKDSK on reboot and if thats a 30 TByte volume you end up with a long downtime in NTFS - the biggest reason for me why I can't wait to jump on the ReFS band wagon.
InquiringMind
Level SS
Level SS
Posts: 477
Joined: Wed Oct 06, 2010 11:10 pm

Re: ReFS vs. NTFS, Stripe vs. Simple Storage Space, Diskeeper vs. PrimoCache

Post by InquiringMind »

One interesting article comparing NTFS and ReFS can be found at:

https://blogs.technet.microsoft.com/ask ... -i-use-it/

The thing that stood out about ReFS to me was its lack of disk quota support - not an issue for (most) single-user systems but surely the kiss of death on multi-user servers? Also according to an insider post, much of ReFS' code was copied from NTFS. On the one hand, if true, it means tried/tested/debugged code rather than something best avoided till the second/third Service Pack - on the other, it makes ReFS less of a New Thing (though the "purple opium-fueled Victorian horror novel" NTFS doesn't come out looking too good either...).

Having checksums on file data to detect gradual disk corruption is a good idea but this can be done on NTFS with utilities like HashTab or HashCheck and these offer a choice of algorithms, though not the automatic background verification provided by ReFS. Still, using these with a regular check run by Windows Scheduler would seem a good extra line of defence for critical data (though it would also flood any file/block cache while verifying).
Axel Mertes wrote:...Then there is the fact its scubbing data from one place to another and thereby essential defragmenting the volume more or less automatically, over time. Plus, its increase file/name length among some other important improvements.
That "automatic defragmention" isn't a given - writing a single file under NTFS to a (near-empty) disk could result in fragmentation and I can't find anything on ReFS indicating it would behave better here. And according to this article the filename length limit in ReFS remains at 255 characters for consistency with NTFS.

Chkdsk times are certainly a problem for NTFS, especially if the "dirty bit" gets stuck forcing a check on every reboot. Disabling AutoChk and relying on checkums to protect data might be an option to consider instead.

Aside from that, interesting to see the benchmarks (and pretty good performance for an HDD-based array) and thanks for posting your thoughts.
Axel Mertes
Level 9
Level 9
Posts: 184
Joined: Thu Feb 03, 2011 3:22 pm

Re: ReFS vs. NTFS, Stripe vs. Simple Storage Space, Diskeeper vs. PrimoCache

Post by Axel Mertes »

Very good informations, especially in the linked blog entry.

Most important: CHKDSK run times have been dramatically improved between Server 2008 R2 and Server 2012 for NTFS volumes. Down from hundreds of minutes (= several hours) to now a few seconds?!???

That helps on my decision to stay with NTFS for a while.
Axel Mertes
Level 9
Level 9
Posts: 184
Joined: Thu Feb 03, 2011 3:22 pm

Re: ReFS vs. NTFS, Stripe vs. Simple Storage Space, Diskeeper vs. PrimoCache

Post by Axel Mertes »

Some more comments from my side:

Using tools like HashTab etc. is surely helpful, but it does not actually prevent data loss - it only helps detecting it. When I copy or move data I rely on verifying copy using either xcopy or robocopy with according options, or teracopy with correct options set ( which are based on Hash calculations too).

It must be said that a read-write cache like PrimoCache can fool you, when such Hash calculations might be based on false read or written cache data.

I think such Hash stuff must be employed on the lowest levels, preferably inside the storage media itself. But that is another story.

What was an eye-opener to me was the comment in the linked blog entry about integrity levels being granular down to a file level in ReFS. How cool is that? So we are able to have selective integrity 'mirror' streams on specific files and folders for important data, while staying without for reproduce able data such as in my case rendered images. If that we're in NTFS...
Axel Mertes
Level 9
Level 9
Posts: 184
Joined: Thu Feb 03, 2011 3:22 pm

Re: ReFS vs. NTFS, Stripe vs. Simple Storage Space, Diskeeper vs. PrimoCache

Post by Axel Mertes »

Another interesting read on integrity and possibility of recovering the really probable read errors:

http://blog.fosketts.net/2014/12/19/big ... -checking/

So I keep on employing https://www.syncovery.com for making dedicate online / near line backups of important data, based on custom rules. Tape based archiving and backup is used too, but far too slow for most of that data, so it's being used more on final archiving.
InquiringMind
Level SS
Level SS
Posts: 477
Joined: Wed Oct 06, 2010 11:10 pm

Re: ReFS vs. NTFS, Stripe vs. Simple Storage Space, Diskeeper vs. PrimoCache

Post by InquiringMind »

An interesting article which, given the increase in hard disk sizes, seems to be becoming relevant to home users as well as enterprise veterans.
Axel Mertes wrote:Using tools like HashTab etc. is surely helpful, but it does not actually prevent data loss - it only helps detecting it.
Agreed - I did try a search for tools offering Hamming or Reed-Solomon codes and managed to turn up QuickPar which offers Reed-Solomon verification and the ability to set the level of redundancy, up to 100%. Not had a chance to test it out, but it looks interesting.

Your RAID-6 controller should be doing something similar (if so, does it allow you to control redundancy levels?) but data scrubbing (or patrol read) where it checks all blocks for read errors would seem the best defence, if it can be done without too much impact on performance.
Axel Mertes wrote:When I copy or move data I rely on verifying copy using either xcopy or robocopy with according options, or teracopy with correct options set ( which are based on Hash calculations too).
That may do the job in verifying the destination copy (my choice of poison here is FastCopy automatically invoked by Total Commander), but what if the source copy has been corrupted?
Axel Mertes wrote:So I keep on employing https://www.syncovery.com for making dedicate online / near line backups of important data, based on custom rules.
Again though, if source data has been corrupted (e.g. "bit-flipped") then surely such a utility would just propagate any errors? Similarly, accidental file overwrites, ransonware encryption or file corruption through other causes would be replicated onto the backup also.

That's why I would favour file versioning, where previous copies of a file are kept (though that is available as an option in Syncovery). Even that can be defeated, e.g. by ransomware making multiple overwrites) but the odds are more in your favour.

Fortunately there are a number of free tools available for this, Aphar Backup being my choice (though AutoVer, Yadis Backup and FileHamster (non-free) offer similar features).
Axel Mertes
Level 9
Level 9
Posts: 184
Joined: Thu Feb 03, 2011 3:22 pm

Re: ReFS vs. NTFS, Stripe vs. Simple Storage Space, Diskeeper vs. PrimoCache

Post by Axel Mertes »

Good comments!

Basically there are several levels of redundancy and ECC at work:

- At drive level data is written usually in some kind of ECC to compensate for bit flips on magnetic surface. How the drive tries to fix such bit flips is up to the vendor. Maybe they try overwriting or relocating the block (with corrected data). At least they should be able to detect and repair them (single bit flips, not really bad sectors).

- My RAID6 controller can survive any 2 of all drives in a RAID6 set to fail, in my case each RAID6 set has 8 drives, so any 2 out of 8 can fail. The RAID6 controller should be able to compensate for failing drives and therefor blocks.

However, I am not 100% sure if the RAID6 will identify a bit flip read error. I can only hope for that there is some kind of ECC across the drives at work in the specific RAID6 implementation. Would be an interesting question to the vendor. As far as I can tell, RAID6 allows rebuilding data on parity, but does it always verify read data using the parity or only if it missed a drive?
THAT is a good question...

You are right, making copies is likely just transporting and inheriting the bit flips, not necessarily identifying and especially not fixing them. That would actually require that when we write data, we already calculate the Hash code BEFORE writing, then re-read what was written and comparing the Hash. And the same in every step. And where is the Hash stored? And does the Hash have a Hash...?

It is quite hard to believe that any application behaves like this, except specific data duplication software (such as Teracopy etc. or DIT software we use on-set for making multiple copies of camera RAW material for safety reasons).

That given, there is some remaining risk of experiencing the probable bit flip or read errors issues as described in the blog entry I mentioned in my last post.

However, we may "rely" on a few "assumptions" here:

- Writing data should be fresh and bit flips should be unlikely on fresh written data, more on probably long untouched data. That is why scrubbing is meant to identify or resolve such issues. We may also assume that data is written with ECC to disk in first place at the very lowest level, being able to repair the smallest errors in the beginning.

- We have important data (such as user created project files and documents) and less important - usually reproduceable - data (such as rendered images). Our strategy here is "save early, save often" which means you create version numbers of projects every now and then ("v001, v002, v003, ...) and potentially extend them with time stamps such as 2016-10-16 13-33-00 or 2016-10-16.1, 2016-10-16.2, 2016-10-16.3 etc..

- Software like Diskeeper Undelete (now Condusive Undelete I believe) protects files of user selectable type to not being overwritten, but preserved (with versioning in the background) plus we can add on top tools like incremental version preserving copies in Syncovery.

That is, after all, a pretty high level of redundancy. We are not plaing cards yet for remote location copies (mission critical) or expensive "no single point of failure" designs. I would believe that in my case the above strategy is at least better than a stupid RAID1 mirror, as I would potentially keep more safe copies (when needed) and utilize total storage capacity better. Further, in a RAID1 mirror the replication of the stupid bit error is very likely.

Thanks for the links to the other applications, I will have a look when I find time to.
InquiringMind
Level SS
Level SS
Posts: 477
Joined: Wed Oct 06, 2010 11:10 pm

Re: ReFS vs. NTFS, Stripe vs. Simple Storage Space, Diskeeper vs. PrimoCache

Post by InquiringMind »

Axel Mertes wrote:At drive level data is written usually in some kind of ECC to compensate for bit flips on magnetic surface. How the drive tries to fix such bit flips is up to the vendor. Maybe they try overwriting or relocating the block (with corrected data). At least they should be able to detect and repair them (single bit flips, not really bad sectors).
Drives typically use a Hamming code for ECC - a single read/write failure should result in a retry with multiple failures resulting in sector reallocation (see here for a detailed write-up of the process). S.M.A.R.T. statistics should be updated when this occurs and can be used with some degree of confidence to identify failing drives. Unfortunately, the RAID controllers I've come across (the el-cheapo variety of course...) don't provide full S.M.A.R.T. stats for their constituent drives, but if yours does then a utility like Crystal DiskInfo can show them.
Axel Mertes wrote:However, I am not 100% sure if the RAID6 will identify a bit flip read error. I can only hope for that there is some kind of ECC across the drives at work in the specific RAID6 implementation. Would be an interesting question to the vendor. As far as I can tell, RAID6 allows rebuilding data on parity, but does it always verify read data using the parity or only if it missed a drive?
With parity, a single bit-flip should be detected - but two (or any even number) would not. However it would require 2/4/6 drives to flip on the same bit, which is highly unlikely and RAID-6's second parity check should be able to cope with that. The one problem that could occur is having a bit-flip or unrecoverable error during a rebuild.
Axel Mertes wrote:Writing data should be fresh and bit flips should be unlikely on fresh written data, more on probably long untouched data...
There is almost certainly an increasing risk to old data, but the read/write process is inherently uncertain (due to the use of PRML to increase data density) and while ECC normally protects against errors, the 10^-14 unrecoverable read rate (for consumer SATA drives) seems more a reflection of ECC rather than media limitations.

Since the last post, I gave HashCheck and QuickPar a quick(ish) test and found the following:
  • HashCheck is pretty fast (took about 3 hours on 930GB of data, limiting factor being the 150MB/s hard disk, CPU usage never went above 3%) and simple to use. I thought at first nothing was happening (had to use ProcessHacker to confirm it was actually scanning the files) because I missed the progress dialog, which is very small and easy to overlook (it only gives 2 progress bars, overall and per-file).

    The (MD5) hash file was 350K in size and took about the same time to verify (3 hours) as to build. No ability to repair errors, no estimate of scan/build time and verification doesn't highlight any new files added since the checksum build (renamed files are flagged as unreadable though). So this seems best suited for periodic checks on static data like backups.
  • QuickPar was a comparative disappointment. It won't work on folders, you have to select individual files (it was possible to work around this limitation using Total Commander to produce a file list, but this was only accepted once - subsequent attempts didn't show QuickPar in the right-click menu except for "small" 3-4 file selections).

    Initially it seemed quite fast - but it turns out that it is a two stage program, the first being a file scan followed by a memory test and then the actual (slower) checksum generation (which did max out a CPU core and used 1GB RAM). The progress dialog is very limited in that it only shows per-file progress and estimated processing time. Testing with 13GB of data took an hour to generate the .par files which took up 2GB (I increased the redundancy from the default 10% to 15% so the size seems right). Unfortunately verification failed, reporting all files as missing.

    It turns out that the files have to be in the same folder as the .par files for verification and repairing to work, making it unsuitable for file/folder checksumming. Given the emphasis on Usenet posts in the instructions, this seems understandable but a disappointment nonetheless.
Post Reply