Biz & IT —

Microsoft introduces new robust “Resilient File System” for Windows Server 8

Microsoft has revealed details of its next generation Windows file system: …

Microsoft introduces new robust

Storage Spaces will give Windows 8 flexible, fault-tolerant pooling of disk space, and will make storage management simpler and much more powerful. But there's more to robust file storage than replicating data between disks: preventing and detecting corruption, and ensuring that damage to one file does not spread to others are also important. When describing Storage Spaces, Microsoft was silent on how it hoped to tackle these needs. The answer has now been revealed: a new file system, ReFS (from "Resilient File System").

Storage Spaces make it easy to cope from a failed disk, but are no help if a disk is merely producing bad data. The Storage Space will be able to tell you if two mirrored drives differ or if the parity check fails, but have no way of determining which drive is right and which is wrong. Erasures, where the data is missing altogether, can be corrected; errors, where the data is wrong, can only be detected.

ReFS is designed to pick up where Storage Spaces leave off. To protect its internal data structures, file system metadata, and, optionally, user data against corruption, ReFS calculates and stores checksums for the data and metadata. Each piece of information protected by the checksum is fed into a checksum algorithm, and the result is a number, the checksum; in ReFS's case, the checksum is a 64-bit number. Checksum algorithms are designed such that a small change in the input causes a large change in the resulting checksum.

Every time ReFS reads file system metadata (or data that has opted in to the checksum protection) it will compute the checksum for the information it has read, and compare this against the stored value. If the two are in agreement then the data has been read correctly; if they aren't, it hasn't.

This checksum protection guards against a range of problems. When writing, two particular issues are "lost writes," where the data never makes it to disk, and "misdirected writes," where a bug in either the file system driver or the firmware of the drive or its controller causes a write to go to the wrong location on disk. When reading data, the biggest concern is "bit rot"—the corruption of correctly-written data due to failures of the disk's magnetic storage.

This solves the problem of not knowing which side of a mirrored pair is the correct one. When used with a Storage Spaces mirror, ReFS will test the checksums of each side of the mirror independently, and then use this to determine which one is correct and which is not.

To further improve protection against bit rot, ReFS will perform scrubbing: it will periodically read all the data and metadata in a volume, verify the checksums are correct, and if necessary use mirrored copies to repair the bad data.

ReFS is also designed to provide greater protection against power failures during write operations. Windows' current file system, NTFS, performs in-place updates of its data structures and metadata. For example, if a file is renamed, NTFS will read the sector of the drive that contains the filename, calculate a new sector with the new filename, and then write the whole sector back out. This means that a whole sector could be corrupted if the write is interrupted by a power failure, a problem known as "torn writes". Traditionally, this meant that 512 bytes could be damaged; on modern hard drives, that's now increased to 4096 bytes. ReFS will work differently; instead of updating the metadata in-place, it will write new metadata to a different location, preventing damage from torn writes.

Like NTFS, ReFS will also prevent cascading failures. If an uncorrectable problem does occur—corruption of data that isn't mirrored, or has no valid mirrors—ReFS will be able to make a record of the problem and then remove access to the damaged data, without taking the volume offline or interrupting access to any other data on the same volume.

The decision to change file systems is not one to be taken lightly. A bug in a file system can cause catastrophic data loss for millions of people. The two file system implementations with the most real-world testing are both Microsoft's; its NTFS and FAT32 drivers have more users than any other file system drivers in the world. That's not something to be discarded on a whim: the enormous amount of real-world testing means that bugs and unusual corner case situations are more likely to have been found—and fixed—in those file systems than any other.

Microsoft is not discarding these tried and trusted NTFS code entirely with ReFS. The ReFS driver will take parts of the existing NTFS driver, including its API and handling of caching and security, and re-use them. Only the lower-level parts of the driver, the parts concerned with how data is laid out on disk, have changed.

Understandably, Microsoft is taking a conservative approach to rolling out ReFS support. Only Windows Server 8 will include the ReFS; desktop-oriented Windows 8 will stick with NTFS. Just as with Storage Spaces, ReFS will not be usable as a boot drive, either. Future versions of Windows will extend ReFS support to the client, and eventually make it bootable. Windows Server 8 will also have no facility for converting NTFS volumes to ReFS; creating a new volume and copying data will be the only migration path.

More surprisingly, ReFS contains a number of feature regressions relative to NTFS. Most remarkably, it doesn't support hard links, a feature found on NTFS and all UNIX file systems that allows one file to be given multiple names at multiple paths. This is particularly surprising, because Windows uses hard links extensively for its side-by-side storage of different library versions. Using hard links in this way is relatively new to Windows; Windows Vista was the first to do so.

Another new-in-Windows Vista feature, transactional updates to user data, also won't be included. Windows Vista and Windows 7 allow database-like transactions to be made to the file system, wherein a batch of updates to the file system can be made atomically. Per-file encryption and per-file compression also aren't supported in ReFS. Nor are named data streams, handy as these are for working with Mac OS X clients. Storage quotas are also gone.

Not all the missing features are surprising. Two features that exist only for backwards compatibility—generation of short (8.3) filenames, for DOS compatibility, and support for Extended Attributes, for OS/2 compatibility—are removed.

While the backwards compatibility features can safely stay gone forever, it's hard to see ReFS accepted as a full-on NTFS replacement until the other omissions are resolved.

In spite of its limited availability and feature set, Microsoft says that ReFS will be production-ready, and decidedly not a beta, at Windows Server 8's launch. In addition to the re-use of existing code, the company has also tested the new file system extensively, claiming that it has achieved an "unprecedented level of robustness" for a Microsoft file system.

Though Microsoft has continued to improve NTFS in Windows Vista and Windows 7, it has looked increasingly long in the tooth, particularly when compared to Oracle's ZFS and Linux's btrfs. Both of these file systems have offered flexible storage handling, and integrated fault tolerance, advancing the state of the art of mainstream file systems. Storage Spaces go a long way towards providing Windows with comparably capable volume management. ReFS fills most of the remaining gaps, adding a file system structure that's fundamentally more robust than current NTFS.

Listing image by Photograph by Beer Coaster

Channel Ars Technica