Sunday, November 2, 2008

A Better File System for Linux?

At the heart of every operating system is the file system that provides read/write access to data. Since 2001, Ext3 has been the mainstay of Linux file systems. But the winds of change could be blowing toward a better file system in the works. 

BTRFS (pronounced better FS) is currently under development in an effort led by Oracle engineer Chris Mason. With the support of Intel(NASDAQ: INTC), Red Hat (NYSE: RHT), HP (NYSE: HPQ), IBM (NYSE: IBM), BTRFS could become the engine that brings next generation filesystem capabilities to Linux. 

"The main goal is to let it {Linux} scale for the storage that will be available," Chris Mason Director of Linux Kernel Engineering at Oracle told InternetNews.com. "Scaling is not just about addressing the storage but also means being able to administer and to manage it with a clean interface that lets people see what's being used and makes it more reliable." 

After all, Mason noted, although hard drives are getting bigger, the error rates on those drives are not going down. 

For more stories on this topic:
  

"We need to be able to easily determine when disks have the wrong information on them," Mason said. "And we need to be able to do filesystem consistency checking and need to be able to recover from errors in a much more robust fashion then we currently do." 

With the current Ext3 Linux filesystem, scaling to meet the needs of large storage is a challenge for several reasons. 

One of them is that Ext3 was never designed for the large data pools that enterprise and even consumer desktop users now have available to them. Mason noted that on Ext3, for every 4k of data there is a piece of metadata pointing to where that 4k of data is on the drive. So as files grow larger, the grows the amount of metadata. It's not very efficient. 

"BTRFS uses something called extents, which just says that from this starting position for this number of blocks, use this part of the disk," Mason said. 

The extents approach is more scalable and efficient than the 4k block approach of Ext3. Extents are also part of the new Ext4 file system, which is part of the upcoming 2.6.28 Linux kernel release. 

Though Ext4 adds extents, Mason noted that BTRFS adds a number of other features beyond that. Among those features are items like snapshotting, online file consistency checks and the ability to perform fast incremental backups. 

"Btrfs is a file system that we think has a lot of potential to be a key next generation file system for Linux," Ric Wheeler, file system kernel for Red Hat, told InternetNews.com. 

Wheeler noted that Red Hat engineers are actively contributing to its development and have made substantial contributions to the project. Intel is also active in BTRFS. Imad Sousou, the Director of Intel's Open Source Technology Center told InternetNews.com that Intel likes and contributes to Btrfs. 

Intel's view is that it has the right technology and architectural decisions to evolve Linux file systems to support the increased demands in areas such as performance and fault tolerance. 

HP is also a key backer of the effort and is putting some of its UNIX heritage in play to help out. 

"BTRFS is very interesting to HP because its goal is to provide a core set of features that are very similar to what we have with Tru64 AdvFS and a set of features that go beyond that," said Bdale Garbee, chief technologist for HP's open source and Linux organization. 

In June of this year, HP open sourced its Tru64 AdvFS file system, which has its roots in Digital Equipment Corporation's Digital Unix. Oracle's Mason noted that he was a Tru64 user years ago and that the open sourcing of AdvFS by HP has been a major benefit to the BTRFS effort. 

Mason is now pushing to have a testable version of BTRFS available to Linux user before the end of 2008. He's currently targeting to have a version of BTRFS available as part of the 2.6.29 Linux kernel as well. 

"This week I pushed out the last huge format change and there are still a few small ones to come, " Mason said. "Once all that is done it will make it much easier to get out to a broad number of testers." 

At this point, Mason's biggest challenge in getting BTRFS together is keeping it stable. 

"There is a big volume of changes coming in because we have a lot of contributors," Mason said. "It's mostly just a software engineering effort to make sure we make it easy to test and don't introduce regressions.

No comments: