提交 · 49cf6f4529b7945ef51b8e39f0bac630726f8c96 · xiphi1978 / linux

30 9月, 2009 1 次提交

Btrfs: Fix setting umask when POSIX ACLs are not enabled · 49cf6f45

由 Chris Ball 提交于 9月 29, 2009

We currently set sb->s_flags |= MS_POSIXACL unconditionally, which is
incorrect -- it tells the VFS that it shouldn't set umask because we
will, yet we don't set it ourselves if we aren't using POSIX ACLs, so
the umask ends up ignored.
Signed-off-by: NChris Ball <cjb@laptop.org>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

49cf6f45

22 9月, 2009 1 次提交

Btrfs: add snapshot/subvolume destroy ioctl · 76dda93c

由 Yan, Zheng 提交于 9月 21, 2009

This patch adds snapshot/subvolume destroy ioctl. A subvolume that isn't being
used and doesn't contains links to other subvolumes can be destroyed.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

76dda93c

13 7月, 2009 1 次提交

headers: smp_lock.h redux · 405f5571

由 Alexey Dobriyan 提交于 7月 11, 2009

* Remove smp_lock.h from files which don't need it (including some headers!)
* Add smp_lock.h to files which do need it
* Make smp_lock.h include conditional in hardirq.h
  It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT

  This will make hardirq.h inclusion cheaper for every PREEMPT=n config
  (which includes allmodconfig/allyesconfig, BTW)
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

405f5571

12 6月, 2009 2 次提交

enforce ->sync_fs is only called for rw superblock · 5af7926f

由 Christoph Hellwig 提交于 5月 05, 2009

Make sure a superblock really is writeable by checking MS_RDONLY
under s_umount.  sync_filesystems needed some re-arragement for
that, but all but one sync_filesystem caller had the correct locking
already so that we could add that check there.  cachefiles grew
s_umount locking.

I've also added a WARN_ON to sync_filesystem to assert this for
future callers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5af7926f

C
btrfs: remove ->write_super and stop maintaining ->s_dirt · 59d697b7
由 Christoph Hellwig 提交于 4月 27, 2009
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
59d697b7

11 6月, 2009 1 次提交

Btrfs: fix -o nodatasum printk spelling · 067c28ad

由 Chris Mason 提交于 6月 11, 2009

It was printing nodatacsum, which was not the correct option name.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

067c28ad

10 6月, 2009 4 次提交

Btrfs: autodetect SSD devices · c289811c

由 Chris Mason 提交于 6月 10, 2009

During mount, btrfs will check the queue nonrot flag
for all the devices found in the FS.  If they are all
non-rotating, SSD mode is enabled by default.

If the FS was mounted with -o nossd, the non-rotating
flag is ignored.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c289811c

Btrfs: add mount -o ssd_spread to spread allocations out · 451d7585

由 Chris Mason 提交于 6月 09, 2009

Some SSDs perform best when reusing block numbers often, while
others perform much better when clustering strictly allocates
big chunks of unused space.

The default mount -o ssd will find rough groupings of blocks
where there are a bunch of free blocks that might have some
allocated blocks mixed in.

mount -o ssd_spread will make sure there are no allocated blocks
mixed in.  It should perform better on lower end SSDs.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

451d7585

Btrfs: Add mount -o nossd · 3b30c22f

由 Chris Mason 提交于 6月 09, 2009

This allows you to turn off the ssd mode via remount.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

3b30c22f

Btrfs: Mixed back reference (FORWARD ROLLING FORMAT CHANGE) · 5d4f98a2

由 Yan Zheng 提交于 6月 10, 2009

This commit introduces a new kind of back reference for btrfs metadata.
Once a filesystem has been mounted with this commit, IT WILL NO LONGER
BE MOUNTABLE BY OLDER KERNELS.

When a tree block in subvolume tree is cow'd, the reference counts of all
extents it points to are increased by one. At transaction commit time,
the old root of the subvolume is recorded in a "dead root" data structure,
and the btree it points to is later walked, dropping reference counts
and freeing any blocks where the reference count goes to 0.

The increments done during cow and decrements done after commit cancel out,
and the walk is a very expensive way to go about freeing the blocks that
are no longer referenced by the new btree root. This commit reduces the
transaction overhead by avoiding the need for dead root records.

When a non-shared tree block is cow'd, we free the old block at once, and the
new block inherits old block's references. When a tree block with reference
count > 1 is cow'd, we increase the reference counts of all extents
the new block points to by one, and decrease the old block's reference count by
one.

This dead tree avoidance code removes the need to modify the reference
counts of lower level extents when a non-shared tree block is cow'd.
But we still need to update back ref for all pointers in the block.
This is because the location of the block is recorded in the back ref
item.

We can solve this by introducing a new type of back ref. The new
back ref provides information about pointer's key, level and in which
tree the pointer lives. This information allow us to find the pointer
by searching the tree. The shortcoming of the new back ref is that it
only works for pointers in tree blocks referenced by their owner trees.

This is mostly a problem for snapshots, where resolving one of these
fuzzy back references would be O(number_of_snapshots) and quite slow.
The solution used here is to use the fuzzy back references in the common
case where a given tree block is only referenced by one root,
and use the full back references when multiple roots have a reference
on a given block.

This commit adds per subvolume red-black tree to keep trace of cached
inodes. The red-black tree helps the balancing code to find cached
inodes whose inode numbers within a given range.

This commit improves the balancing code by introducing several data
structures to keep the state of balancing. The most important one
is the back ref cache. It caches how the upper level tree blocks are
referenced. This greatly reduce the overhead of checking back ref.

The improved balancing code scales significantly better with a large
number of snapshots.

This is a very large commit and was written in a number of
pieces. But, they depend heavily on the disk format change and were
squashed together to make sure git bisect didn't end up in a
bad state wrt space balancing or the format change.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

5d4f98a2

15 5月, 2009 1 次提交

Btrfs: make show_options result match actual option names · 6b65c5c6

由 Sage Weil 提交于 5月 14, 2009

The notreelog and flushoncommit mount options were being printed slightly
differently.
Signed-off-by: NSage Weil <sage@newdream.net>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

6b65c5c6

09 5月, 2009 1 次提交
- A
  Convert obvious places to deactivate_locked_super() · 6f5bbff9
  由 Al Viro 提交于 5月 06, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  6f5bbff9
27 4月, 2009 1 次提交

Btrfs: Fix a bunch of printk() warnings. · 21380931

由 Joel Becker 提交于 4月 21, 2009

Just happened to notice a bunch of %llu vs u64 warnings.  Here's a patch
to cast them all.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

21380931

25 4月, 2009 1 次提交

Btrfs: try to keep a healthy ratio of metadata vs data block groups · 97e728d4

由 Josef Bacik 提交于 4月 21, 2009

This patch makes the chunk allocator keep a good ratio of metadata vs data
block groups. By default for every 8 data block groups, we'll allocate 1
metadata chunk, or about 12% of the disk will be allocated for metadata. This
can be changed by specifying the metadata_ratio mount option.

This is simply the number of data block groups that have to be allocated to
force a metadata chunk allocation. By making sure we allocate metadata chunks
more often, we are less likely to get into situations where the whole disk
has been allocated as data block groups.
Signed-off-by: NJosef Bacik <jbacik@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

97e728d4

21 4月, 2009 1 次提交

btrfs: use memdup_user() · dae7b665

由 Li Zefan 提交于 4月 08, 2009

Remove open-coded memdup_user().

Note this changes some GFP_NOFS to GFP_KERNEL, since copy_from_user() may
cause pagefault, it's pointless to pass GFP_NOFS to kmalloc().
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

dae7b665

03 4月, 2009 3 次提交

Btrfs: add flushoncommit mount option · dccae999

由 Sage Weil 提交于 4月 02, 2009

The 'flushoncommit' mount option forces any data dirtied by a write in a
prior transaction to commit as part of the current commit.  This makes
the committed state a fully consistent view of the file system from the
application's perspective (i.e., it includes all completed file system
operations).  This was previously the behavior only when a snapshot is
created.

This is used by Ceph to ensure that completed writes make it to the
platter along with the metadata operations they are bound to (by
BTRFS_IOC_TRANS_{START,END}).
Signed-off-by: NSage Weil <sage@newdream.net>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

dccae999

Btrfs: notreelog mount option · 3a5e1404

由 Sage Weil 提交于 4月 02, 2009

Add a 'notreelog' mount option to disable the tree log (used by fsync,
O_SYNC writes).  This is much slower, but the tree logging produces
inconsistent views into the FS for ceph.
Signed-off-by: NSage Weil <sage@newdream.net>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

3a5e1404

Btrfs: introduce btrfs_show_options · a9572a15

由 Eric Paris 提交于 4月 02, 2009

btrfs options can change at times other than mount, yet /proc/mounts shows the
options string used when the fs was mounted (an example would be when btrfs
determines that barriers aren't useful and turns them off.) This patch
instead outputs the actual options in use by btrfs.
Signed-off-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a9572a15

12 2月, 2009 2 次提交

Btrfs: don't clean old snapshots on sync(1) · e1df36d2

由 Chris Mason 提交于 2月 12, 2009

Cleaning old snapshots can make sync(1) somewhat slow, and some users
and applications still use it in a global fsync kind of workload.

This patch changes btrfs not to clean old snapshots during sync, which is
safe from a FS consistency point of view. The major downside is that it
makes it difficult to tell when old snapshots have been reaped and
the space they were using has been reclaimed. A new ioctl will be added
for this purpose instead.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e1df36d2

Btrfs: process mount options on mount -o remount, · b288052e

由 Chris Mason 提交于 2月 12, 2009

Btrfs wasn't parsing any new mount options during remount, making it
difficult to set mount options on a root drive.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b288052e

21 1月, 2009 2 次提交

Btrfs: removed unused #include <version.h>'s · 7eaebe7d

由 Huang Weiyi 提交于 1月 21, 2009

Removed unused #include <version.h>'s in btrfs
Signed-off-by: NHuang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

7eaebe7d

Btrfs: cleanup fs/btrfs/super.c::btrfs_control_ioctl() · 19d00cc1

由 Wang Cong 提交于 1月 21, 2009

- Remove the unused local variable 'len';
- Check return value of kmalloc().
Signed-off-by: NWang Cong <wangcong@zeuux.org>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

19d00cc1

17 1月, 2009 1 次提交

Btrfs: fix ioctl arg size (userland incompatible change!) · c071fcfd

由 Chris Mason 提交于 1月 16, 2009

The structure used to send device in btrfs ioctl calls was not
properly aligned, and so 32 bit ioctls would not work properly on
64 bit kernels.

We could fix this with compat ioctls, but we're just one byte away
and it doesn't make sense at this stage to carry about the compat ioctls
forever at this stage in the project.

This patch brings the ioctl arg up to an evenly aligned 4k.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c071fcfd

16 1月, 2009 1 次提交

btrfs & squashfs: Move btrfs and squashfsto's magic number to <linux/magic.h> · 1bcbf313

由 Qinghuang Feng 提交于 1月 15, 2009

Use the standard magic.h for btrfs and squashfs.
Signed-off-by: NQinghuang Feng <qhfeng.kernel@gmail.com>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1bcbf313

10 1月, 2009 1 次提交

btrfs: fix for write_super_lockfs/unlockfs error handling · 0176260f

由 Linus Torvalds 提交于 1月 10, 2009

Commit c4be0c1d added the ability for
write_super_lockfs to return errors, and renamed them to match.  But
btrfs didn't get converted.

Do the minimal conversion to make it compile again.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0176260f

06 1月, 2009 3 次提交

Btrfs: Fix checkpatch.pl warnings · d397712b

由 Chris Mason 提交于 1月 05, 2009

There were many, most are fixed now.  struct-funcs.c generates some warnings
but these are bogus.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d397712b

Btrfs: fix a memory leak in btrfs_get_sb · 1f483660

由 Shen Feng 提交于 1月 05, 2009

subvol_name should be freed if error occurs.
Signed-off-by: NShen Feng <shen@cn.fujitsu.com>

1f483660

Btrfs: add permission checks to the ioctls · e441d54d

由 Chris Mason 提交于 1月 05, 2009

Only root can add/remove devices
Only root can defrag subtrees
Only files open for writing can be defragged
Only files open for writing can be the destination for a clone
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e441d54d

12 12月, 2008 1 次提交

Btrfs: shared seed device · e4404d6e

由 Yan Zheng 提交于 12月 12, 2008

This patch makes seed device possible to be shared by
multiple mounted file systems. The sharing is achieved
by cloning seed device's btrfs_fs_devices structure.
Thanks you,
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>

e4404d6e

02 12月, 2008 2 次提交

Btrfs: corret fmode_t annotations · 97288f2c

由 Christoph Hellwig 提交于 12月 02, 2008

Make sure to propagate fmode_t properly and use the right constants for
it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

97288f2c

Btrfs: make things static and include the right headers · b2950863

由 Christoph Hellwig 提交于 12月 02, 2008

Shut up various sparse warnings about symbols that should be either
static or have their declarations in scope.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b2950863

20 11月, 2008 1 次提交

Btrfs: compat code fixes · 4b4e25f2

由 Chris Mason 提交于 11月 20, 2008

The btrfs git kernel trees is used to build a standalone tree for
compiling against older kernels.  This commit makes the standalone tree
work with 2.6.27
Signed-off-by: NChris Mason <chris.mason@oracle.com>

4b4e25f2

18 11月, 2008 2 次提交

Btrfs: Allow subvolumes and snapshots anywhere in the directory tree · 3de4586c

由 Chris Mason 提交于 11月 17, 2008

Before, all snapshots and subvolumes lived in a single flat directory.  This
was awkward and confusing because the single flat directory was only writable
with the ioctls.

This commit changes the ioctls to create subvols and snapshots at any
point in the directory tree.  This requires making separate ioctls for
snapshot and subvol creation instead of a combining them into one.

The subvol ioctl does:

btrfsctl -S subvol_name parent_dir

After the ioctl is done subvol_name lives inside parent_dir.

The snapshot ioctl does:

btrfsctl -s path_for_snapshot root_to_snapshot

path_for_snapshot can be an absolute or relative path.  btrfsctl breaks it up
into directory and basename components.

root_to_snapshot can be any file or directory in the FS.  The snapshot
is taken of the entire root where that file lives.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

3de4586c

Btrfs: Seed device support · 2b82032c

由 Yan Zheng 提交于 11月 17, 2008

Seed device is a special btrfs with SEEDING super flag
set and can only be mounted in read-only mode. Seed
devices allow people to create new btrfs on top of it.

The new FS contains the same contents as the seed device,
but it can be mounted in read-write mode.

This patch does the following:

1) split code in btrfs_alloc_chunk into two parts. The first part does makes
the newly allocated chunk usable, but does not do any operation that modifies
the chunk tree. The second part does the the chunk tree modifications. This
division is for the bootstrap step of adding storage to the seed device.

2) Update device management code to handle seed device.
The basic idea is: For an FS grown from seed devices, its
seed devices are put into a list. Seed devices are
opened on demand at mounting time. If any seed device is
missing or has been changed, btrfs kernel module will
refuse to mount the FS.

3) make btrfs_find_block_group not return NULL when all
block groups are read-only.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>

2b82032c

13 11月, 2008 1 次提交

Btrfs: mount ro and remount support · c146afad

由 Yan Zheng 提交于 11月 12, 2008

This patch adds mount ro and remount support. The main
changes in patch are: adding btrfs_remount and related
helper function; splitting the transaction related code
out of close_ctree into btrfs_commit_super; updating
allocator to properly handle read only block group.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>

c146afad

07 11月, 2008 1 次提交

Btrfs: Optimize compressed writeback and reads · 771ed689

由 Chris Mason 提交于 11月 06, 2008

When reading compressed extents, try to put pages into the page cache
for any pages covered by the compressed extent that readpages didn't already
preload.

Add an async work queue to handle transformations at delayed allocation processing
time.  Right now this is just compression.  The workflow is:

1) Find offsets in the file marked for delayed allocation
2) Lock the pages
3) Lock the state bits
4) Call the async delalloc code

The async delalloc code clears the state lock bits and delalloc bits.  It is
important this happens before the range goes into the work queue because
otherwise it might deadlock with other work queue items that try to lock
those extent bits.

The file pages are compressed, and if the compression doesn't work the
pages are written back directly.

An ordered work queue is used to make sure the inodes are written in the same
order that pdflush or writepages sent them down.

This changes extent_write_cache_pages to let the writepage function
update the wbc nr_written count.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

771ed689

30 10月, 2008 1 次提交

Btrfs: Add zlib compression support · c8b97818

由 Chris Mason 提交于 10月 29, 2008

This is a large change for adding compression on reading and writing,
both for inline and regular extents.  It does some fairly large
surgery to the writeback paths.

Compression is off by default and enabled by mount -o compress.  Even
when the -o compress mount option is not used, it is possible to read
compressed extents off the disk.

If compression for a given set of pages fails to make them smaller, the
file is flagged to avoid future compression attempts later.

* While finding delalloc extents, the pages are locked before being sent down
to the delalloc handler.  This allows the delalloc handler to do complex things
such as cleaning the pages, marking them writeback and starting IO on their
behalf.

* Inline extents are inserted at delalloc time now.  This allows us to compress
the data before inserting the inline extent, and it allows us to insert
an inline extent that spans multiple pages.

* All of the in-memory extent representations (extent_map.c, ordered-data.c etc)
are changed to record both an in-memory size and an on disk size, as well
as a flag for compression.

From a disk format point of view, the extent pointers in the file are changed
to record the on disk size of a given extent and some encoding flags.
Space in the disk format is allocated for compression encoding, as well
as encryption and a generic 'other' field.  Neither the encryption or the
'other' field are currently used.

In order to limit the amount of data read for a single random read in the
file, the size of a compressed extent is limited to 128k.  This is a
software only limit, the disk format supports u64 sized compressed extents.

In order to limit the ram consumed while processing extents, the uncompressed
size of a compressed extent is limited to 256k.  This is a software only limit
and will be subject to tuning later.

Checksumming is still done on compressed extents, and it is done on the
uncompressed version of the data.  This way additional encodings can be
layered on without having to figure out which encoding to checksum.

Compression happens at delalloc time, which is basically singled threaded because
it is usually done by a single pdflush thread.  This makes it tricky to
spread the compression load across all the cpus on the box.  We'll have to
look at parallel pdflush walks of dirty inodes at a later time.

Decompression is hooked into readpages and it does spread across CPUs nicely.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c8b97818

30 9月, 2008 1 次提交

Btrfs: add and improve comments · d352ac68

由 Chris Mason 提交于 9月 29, 2008

This improves the comments at the top of many functions.  It didn't
dive into the guts of functions because I was trying to
avoid merging problems with the new allocator and back reference work.

extent-tree.c and volumes.c were both skipped, and there is definitely
more work todo in cleaning and commenting the code.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d352ac68

26 9月, 2008 1 次提交

Remove Btrfs compat code for older kernels · 2b1f55b0

由 Chris Mason 提交于 9月 24, 2008

Btrfs had compatibility code for kernels back to 2.6.18.  These have
been removed, and will be maintained in a separate backport
git tree from now on.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

2b1f55b0

25 9月, 2008 1 次提交

Btrfs: Reinstate '-osubvol=.' option to mount entire tree · 76fcef19

由 David Woodhouse 提交于 8月 19, 2008

Date: Tue, 19 Aug 2008 16:49:35 +0100
This disappeared when I removed the special case for '.' in btrfs_lookup()
Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

76fcef19