- 30 10月, 2008 1 次提交
-
-
由 Chris Mason 提交于
This is a large change for adding compression on reading and writing, both for inline and regular extents. It does some fairly large surgery to the writeback paths. Compression is off by default and enabled by mount -o compress. Even when the -o compress mount option is not used, it is possible to read compressed extents off the disk. If compression for a given set of pages fails to make them smaller, the file is flagged to avoid future compression attempts later. * While finding delalloc extents, the pages are locked before being sent down to the delalloc handler. This allows the delalloc handler to do complex things such as cleaning the pages, marking them writeback and starting IO on their behalf. * Inline extents are inserted at delalloc time now. This allows us to compress the data before inserting the inline extent, and it allows us to insert an inline extent that spans multiple pages. * All of the in-memory extent representations (extent_map.c, ordered-data.c etc) are changed to record both an in-memory size and an on disk size, as well as a flag for compression. From a disk format point of view, the extent pointers in the file are changed to record the on disk size of a given extent and some encoding flags. Space in the disk format is allocated for compression encoding, as well as encryption and a generic 'other' field. Neither the encryption or the 'other' field are currently used. In order to limit the amount of data read for a single random read in the file, the size of a compressed extent is limited to 128k. This is a software only limit, the disk format supports u64 sized compressed extents. In order to limit the ram consumed while processing extents, the uncompressed size of a compressed extent is limited to 256k. This is a software only limit and will be subject to tuning later. Checksumming is still done on compressed extents, and it is done on the uncompressed version of the data. This way additional encodings can be layered on without having to figure out which encoding to checksum. Compression happens at delalloc time, which is basically singled threaded because it is usually done by a single pdflush thread. This makes it tricky to spread the compression load across all the cpus on the box. We'll have to look at parallel pdflush walks of dirty inodes at a later time. Decompression is hooked into readpages and it does spread across CPUs nicely. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 30 9月, 2008 1 次提交
-
-
由 Chris Mason 提交于
This improves the comments at the top of many functions. It didn't dive into the guts of functions because I was trying to avoid merging problems with the new allocator and back reference work. extent-tree.c and volumes.c were both skipped, and there is definitely more work todo in cleaning and commenting the code. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 26 9月, 2008 1 次提交
-
-
由 Chris Mason 提交于
Btrfs had compatibility code for kernels back to 2.6.18. These have been removed, and will be maintained in a separate backport git tree from now on. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 25 9月, 2008 37 次提交
-
-
由 David Woodhouse 提交于
Date: Tue, 19 Aug 2008 16:49:35 +0100 This disappeared when I removed the special case for '.' in btrfs_lookup() Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 David Woodhouse 提交于
Date: Mon, 18 Aug 2008 13:10:20 +0100 This means that subvolumes get a different fsid, and NFS exporting them works properly. Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 David Woodhouse 提交于
Date: Mon, 18 Aug 2008 12:01:52 +0100 Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Balaji Rao 提交于
Date: Mon, 21 Jul 2008 02:01:56 +0530 Here's an implementation of NFS support for btrfs. It relies on the fixes which are going in to 2.6.28 for the NFS readdir/lookup deadlock. This uses the btrfs_iget helper introduced previously. [dwmw2: Tidy up a little, switch to d_obtain_alias() w/compat routine, change fh_type, store parent's root object ID where needed, fix some get_parent() and fs_to_dentry() bugs] Signed-off-by: NBalaji Rao <balajirrao@gmail.com> Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Yan Zheng 提交于
This trivial patch contains two locking fixes and a off by one fix. --- Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
The btree defragger wasn't making forward progress because the new key wasn't being saved by the btrfs_search_forward function. This also disables the automatic btree defrag, it wasn't scaling well to huge filesystems. The auto-defrag needs to be done differently. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
This creates one kthread for commits and one kthread for deleting old snapshots. All the work queues are removed. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Extent alloctions are still protected by a large alloc_mutex. Objectid allocations are covered by a objectid mutex Other btree operations are protected by a lock on individual btree nodes Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
mount -o thread_pool_size changes the default, which is min(num_cpus + 2, 8). Larger thread pools would make more sense on very large disk arrays. This mount option controls the max size of each thread pool. There are multiple thread pools, so the total worker count will be larger than the mount option. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
max_inline=0 used to force the max_inline size to one sector instead. Now it properly disables inline data items, while still being able to read any that happen to exist on disk. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Christoph Hellwig 提交于
Allows to specify one or multiple device=/dev/foo options during mount so that ioctls on the control device can be avoided. Especially useful when trying to mount a multi-device setup as root. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Christoph Hellwig 提交于
Also adds lots of comments to describe what's going on here. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Sage Weil 提交于
These ioctls let a user application hold a transaction open while it performs a series of operations. A final ioctl does a sync on the fs (closing the current transaction). This is the main requirement for Ceph's OSD to be able to keep the data it's storing in a btrfs volume consistent, and AFAICS it works just fine. The application would do something like fd = ::open("some/file", O_RDONLY); ::ioctl(fd, BTRFS_IOC_TRANS_START); /* do a bunch of stuff */ ::ioctl(fd, BTRFS_IOC_TRANS_END); or just ::close(fd); And to ensure it commits to disk, ::ioctl(fd, BTRFS_IOC_SYNC); When a transaction is held open, the trans_handle is attached to the struct file (via private_data) so that it will get cleaned up if the process dies unexpectedly. A held transaction is also ended on fsync() to avoid a deadlock. A misbehaving application could also deliberately hold a transaction open, effectively locking up the FS, so it may make sense to restrict something like this to root or something. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Linda Knippers 提交于
Send the error back to userland if the ioctl fails Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Mingming 提交于
Use btrfs_release_file instead of a put_inode call Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
This required a few structural changes to the code that manages bdev pointers: The VFS super block now gets an anon-bdev instead of a pointer to the lowest bdev. This allows us to avoid swapping the super block bdev pointer around at run time. The code to read in the super block no longer goes through the extent buffer interface. Things got ugly keeping the mapping constant. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Yan 提交于
Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Remove the btrfs read_inode method, and use save_mount_options Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
There is now extent_map for mapping offsets in the file to disk and extent_io for state tracking, IO submission and extent_bufers. The new extent_map code shifts from [start,end] pairs to [start,len], and pushes the locking out into the caller. This allows a few performance optimizations and is easier to use. A number of extent_map usage bugs were fixed, mostly with failing to remove extent_map entries when changing the file. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Yan 提交于
Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
A number of workloads do not require copy on write data or checksumming. mount -o nodatasum to disable checksums and -o nodatacow to disable both copy on write and checksumming. In nodatacow mode, copy on write is still performed when a given extent is under snapshot. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Signed-off-by: NChris Mason <chris.mason@oracle.com>
-