提交 · a6b0d5c8dbfd428717fc4db4c36757783f391c7b · gsplhtlxg / clone-Linux

23 2月, 2012 1 次提交

Btrfs: make sure we update latest_bdev · a6b0d5c8

由 Chris Mason 提交于 2月 20, 2012

When we are setting up the mount, we close all the
devices that were not actually part of the metadata we found.

But, we don't make sure that one of those devices wasn't
fs_devices->latest_bdev, which means we can do a use after free
on the one we closed.

This updates latest_bdev as it goes.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a6b0d5c8

15 2月, 2012 1 次提交

btrfs: Sector Size check during Mount · 941b2ddf

由 Keith Mannthey 提交于 11月 29, 2011

Gracefully fail when trying to mount a BTRFS file system that has a
sectorsize smaller than PAGE_SIZE.

On PPC it is possible to build a FS while using a 4k PAGE_SIZE kernel
then boot into a 64K PAGE_SIZE kernel.  Presently open_ctree fails in an
endless loop and hangs the machine in this situation.

My debugging has show this Sector size < Page size to be a non trivial
situation and a graceful exit from the situation would be nice for the
time being.
Signed-off-by: NKeith Mannthey <kmannth@us.ibm.com>

941b2ddf

27 1月, 2012 1 次提交

btrfs: mask out gfp flags in releasepage · 0c4e538b

由 David Sterba 提交于 1月 26, 2012

btree_releasepage is a callback and can be passed unknown gfp flags and then
they may end up in kmem_cache_alloc called from alloc_extent_state, slab
allocator will BUG_ON when there is HIGHMEM or DMA32 flag set.

This may happen when btrfs is mounted from a loop device, which masks out
__GFP_IO flag. The check in try_release_extent_state

3399                 if ((mask & GFP_NOFS) == GFP_NOFS)
3400                         mask = GFP_NOFS;

will not work and passes unfiltered flags further resulting in crash at
mm/slab.c:2963

 [<000000000024ae4c>] cache_alloc_refill+0x3b4/0x5c8
 [<000000000024c810>] kmem_cache_alloc+0x204/0x294
 [<00000000001fd3c2>] mempool_alloc+0x52/0x170
 [<000003c000ced0b0>] alloc_extent_state+0x40/0xd4 [btrfs]
 [<000003c000cee5ae>] __clear_extent_bit+0x38a/0x4cc [btrfs]
 [<000003c000cee78c>] try_release_extent_state+0x9c/0xd4 [btrfs]
 [<000003c000cc4c66>] btree_releasepage+0x7e/0xd0 [btrfs]
 [<0000000000210d84>] shrink_page_list+0x6a0/0x724
 [<0000000000211394>] shrink_inactive_list+0x230/0x578
 [<0000000000211bb8>] shrink_list+0x6c/0x120
 [<0000000000211e4e>] shrink_zone+0x1e2/0x228
 [<0000000000211f24>] shrink_zones+0x90/0x254
 [<0000000000213410>] do_try_to_free_pages+0xac/0x420
 [<0000000000213ae0>] try_to_free_pages+0x13c/0x1b0
 [<0000000000204e6c>] __alloc_pages_nodemask+0x5b4/0x9a8
 [<00000000001fb04a>] grab_cache_page_write_begin+0x7e/0xe8
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

0c4e538b

17 1月, 2012 5 次提交

Btrfs: allow for canceling restriper · a7e99c69

由 Ilya Dryomov 提交于 1月 16, 2012

Implement an ioctl for canceling restriper.  Currently we wait until
relocation of the current block group is finished, in future this can be
done by triggering a commit.  Balance item is deleted and no memory
about the interrupted balance is kept.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

a7e99c69

Btrfs: allow for pausing restriper · 837d5b6e

由 Ilya Dryomov 提交于 1月 16, 2012

Implement an ioctl for pausing restriper.  This pauses the relocation,
but balance is still considered to be "in progress": balance item is
not deleted, other volume operations cannot be started, etc.  If paused
in the middle of profile changing operation we will continue making
allocations with the target profile.

Add a hook to close_ctree() to pause restriper and free its data
structures on unmount.  (It's safe to unmount when restriper is in
"paused" state, we will resume with the same parameters on the next
mount)
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

837d5b6e

Btrfs: recover balance on mount · 59641015

由 Ilya Dryomov 提交于 1月 16, 2012

On mount, if balance item is found, resume balance in a separate
kernel thread.

Try to be smart to continue roughly where previous balance (or convert)
was interrupted.  For chunk types that were being converted to some
profile we turn on soft convert, in case of a simple balance we turn on
usage filter and relocate only less-than-90%-full chunks of that type.
These are just heuristics but they help quite a bit, and can be improved
in future.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

59641015

Btrfs: add basic restriper infrastructure · c9e9f97b

由 Ilya Dryomov 提交于 1月 16, 2012

Add basic restriper infrastructure: extended balancing ioctl and all
related ioctl data structures, add data structure for tracking
restriper's state to fs_info, etc.  The semantics of the old balancing
ioctl are fully preserved.

Explicitly disallow any volume operations when balance is in progress.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

c9e9f97b

Btrfs: get rid of *_alloc_profile fields · 6fef8df1

由 Ilya Dryomov 提交于 1月 16, 2012

{data,metadata,system}_alloc_profile fields have been unused for a long
time now.  Get rid of them.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

6fef8df1

11 1月, 2012 1 次提交

Btrfs: fix possible deadlock when opening a seed device · b367e47f

由 Li Zefan 提交于 12月 07, 2011

The correct lock order is uuid_mutex -> volume_mutex -> chunk_mutex,
but when we mount a filesystem which has backing seed devices, we have
this lock chain:

    open_ctree()
        lock(chunk_mutex);
        read_chunk_tree();
            read_one_dev();
                open_seed_devices();
                    lock(uuid_mutex);

and then we hit a lockdep splat.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>

b367e47f

22 12月, 2011 2 次提交

Btrfs: mark delayed refs as for cow · 66d7e7f0

由 Arne Jansen 提交于 9月 12, 2011

Add a for_cow parameter to add_delayed_*_ref and pass the appropriate value
from every call site. The for_cow parameter will later on be used to
determine if a ref will change anything with respect to qgroups.

Delayed refs coming from relocation are always counted as for_cow, as they
don't change subvol quota.

Also pass in the fs_info for later use.

btrfs_find_all_roots() will use this as an optimization, as changes that are
for_cow will not change anything with respect to which root points to a
certain leaf. Thus, we don't need to add the current sequence number to
those delayed refs.
Signed-off-by: NArne Jansen <sensille@gmx.net>
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

66d7e7f0

Btrfs: integrate integrity check module into btrfs · 21adbd5c

由 Stefan Behrens 提交于 11月 09, 2011

This is the last part of the patch series. It modifies the btrfs
code to use the integrity check module if configured to do so
with the define BTRFS_FS_CHECK_INTEGRITY. If this define is not set,
the only effective change is that code is added that handles the
mount option to activate the integrity check. If the mount option is
set and the define BTRFS_FS_CHECK_INTEGRITY is not set, that code
complains in the log and the mount fails with EINVAL.

Add the mount option to activate the usage of the integrity check
code.
Add invocation of btrfs integrity check code init and cleanup
function on mount and umount, respectively.
Add hook to call btrfs integrity check code version of
submit_bh/submit_bio.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>

21adbd5c

16 12月, 2011 1 次提交

Btrfs: fix num_workers_starting bug and other bugs in async thread · 0dc3b84a

由 Josef Bacik 提交于 11月 18, 2011

Al pointed out we have some random problems with the way we account for
num_workers_starting in the async thread stuff.  First of all we need to make
sure to decrement num_workers_starting if we fail to start the worker, so make
__btrfs_start_workers do this.  Also fix __btrfs_start_workers so that it
doesn't call btrfs_stop_workers(), there is no point in stopping everybody if we
failed to create a worker.  Also check_pending_worker_creates needs to call
__btrfs_start_work in it's work function since it already increments
num_workers_starting.

People only start one worker at a time, so get rid of the num_workers argument
everywhere, and make btrfs_queue_worker a void since it will always succeed.
Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

0dc3b84a

20 11月, 2011 2 次提交

btrfs: mirror_num should be int, not u64 · 32240a91

由 Jan Schmidt 提交于 11月 20, 2011

My previous patch introduced some u64 for failed_mirror variables, this one
makes it consistent again.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

32240a91

Btrfs: fix barrier flushes · 387125fc

由 Chris Mason 提交于 11月 18, 2011

When btrfs is writing the super blocks, it send barrier flushes to make
sure writeback caching drives get all the metadata on disk in the
right order.

But, we have two bugs in the way these are sent down.  When doing
full commits (not via the tree log), we are sending the barrier down
before the last super when it should be going down before the first.

In multi-device setups, we should be waiting for the barriers to
complete on all devices before writing any of the supers.

Both of these bugs can cause corruptions on power failures.  We fix it
with some new code to send down empty barriers to all devices before
writing the first super.

Alexandre Oliva found the multi-device bug.  Arne Jansen did the async
barrier loop.
Signed-off-by: NChris Mason <chris.mason@oracle.com>
Reported-by: NAlexandre Oliva <oliva@lsd.ic.unicamp.br>

387125fc

10 11月, 2011 2 次提交

Btrfs: close devices on all error paths in open_ctree() · 586e46e2

由 Ilya Dryomov 提交于 11月 09, 2011

Fix a bug introduced by 7e662854 where we would leave devices busy on
certain error paths in open_ctree(). fs_info is guaranteed to be
non-NULL now so it's safe to dereference it on all error paths.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

586e46e2

Btrfs: avoid null dereference and leaks when bailing from open_ctree() · 4d34b278

由 Ilya Dryomov 提交于 11月 09, 2011

Fix bugs introduced by 6c41761f.  Firstly, after failing to allocate any
of the tree roots (first 'goto fail' in open_ctree()) we would
dereference a NULL fs_info pointer in free_fs_info().  Secondly, after
failures from init_srcu_struct(), setup_bdi() and new_inode() we would
leak all earlier allocated roots: fs_info fields haven't been
initialized yet so free_fs_info() is rendered useless.

Fix this by initializing fs_info pointer and fs_info fields before any
allocations happen.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

4d34b278

07 11月, 2011 1 次提交

Btrfs: check for a null fs root when writing to the backup root log · 7c7e82a7

由 Chris Mason 提交于 11月 06, 2011

During log replay, can commit the transaction before the fs_root
pointers are setup, so we have to make sure they are not null before
trying to use them.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

7c7e82a7

06 11月, 2011 6 次提交

Btrfs: stop the readahead threads on failed mount · 306c8b68

由 Chris Mason 提交于 11月 03, 2011

If we don't stop them, they linger around corrupting
memory by using pointers to freed things.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

306c8b68

Btrfs: fix extent_buffer leak in the metadata IO error handling · c674e04e

由 Chris Mason 提交于 11月 03, 2011

The scrub readahead branch brought in a new error handling hook,
but it was leaking extent_buffer references.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c674e04e

Btrfs: make a delayed_block_rsv for the delayed item insertion · 6d668dda

由 Josef Bacik 提交于 11月 03, 2011

I've been hitting warnings in use_block_rsv when running the delayed insertion
stuff. It's because we will readjust global block rsv based on what is in use,
which means we could end up discarding reservations that are for the delayed
insertion stuff. So instead create a seperate block rsv for the delayed
insertion stuff. This will also make it easier to debug problems with the
delayed insertion reservations since we will know that only the delayed
insertion code touches this block_rsv. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

6d668dda

Btrfs: add a log of past tree roots · af31f5e5

由 Chris Mason 提交于 11月 03, 2011

This takes some of the free space in the btrfs super block
to record information about most of the roots in the last four
commits.

It also adds a -o recovery to use the root history log when
we're not able to read the tree of tree roots, the extent
tree root, the device tree root or the csum root.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

af31f5e5

btrfs: separate superblock items out of fs_info · 6c41761f

由 David Sterba 提交于 4月 13, 2011

fs_info has now ~9kb, more than fits into one page. This will cause
mount failure when memory is too fragmented. Top space consumers are
super block structures super_copy and super_for_commit, ~2.8kb each.
Allocate them dynamically. fs_info will be ~3.5kb. (measured on x86_64)

Add a wrapper for freeing fs_info and all of it's dynamically allocated
members.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

6c41761f

Btrfs: make sure to flush queued bios if write_cache_pages waits · 01d658f2

由 Chris Mason 提交于 11月 01, 2011

write_cache_pages tries to build up a large bio to stuff down the pipe.
But if it needs to wait for a page lock, it needs to make sure and send
down any pending writes so we don't deadlock with anyone who has the
page lock and is waiting for writeback of things inside the bio.

Dave Sterba triggered this as a deadlock between the autodefrag code and
the extent write_cache_pages
Signed-off-by: NChris Mason <chris.mason@oracle.com>

01d658f2

02 11月, 2011 1 次提交

filesystems: add set_nlink() · bfe86848

由 Miklos Szeredi 提交于 10月 28, 2011

Replace remaining direct i_nlink updates with a new set_nlink()
updater function.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Tested-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

bfe86848

20 10月, 2011 3 次提交

Btrfs: allow us to overcommit our enospc reservations · 2bf64758

由 Josef Bacik 提交于 9月 26, 2011

One of the things that kills us is the fact that our ENOSPC reservations are
horribly over the top in most normal cases.  There isn't too much that can be
done about this because when we are completely full we really need them to work
like this so we don't under reserve.  However if there is plenty of unallocated
chunks on the disk we can use that to gauge how much we can overcommit.  So this
patch adds chunk free space accounting so we always know how much unallocated
space we have.  Then if we fail to make a reservation within our allocated
space, check to see if we can overcommit.  In the normal flushing case (like
with delalloc metadata reservations) we'll take the free space and divide it by
2 if our metadata profile is setup for DUP or any of those, and then divide it
by 8 to make sure we don't overcommit too much.  Then if we're in a non-flushing
case (we really need this reservation now!) we only limit ourselves to half of
the free space.  This makes this fio test

[torrent]
filename=torrent-test
rw=randwrite
size=4g
ioengine=sync
directory=/mnt/btrfs-test

go from taking around 45 minutes to 10 seconds on my freshly formatted 3 TiB
file system.  This doesn't seem to break my other enospc tests, but could really
use some more testing as this is a super scary change.  Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

2bf64758

Btrfs: put the block group cache after we commit the super · 300e4f8a

由 Josef Bacik 提交于 8月 29, 2011

In moving some enospc stuff around I noticed that when we unmount we are often
evicting the free space cache inodes before we do our last commit. This isn't
bad, but it makes us constantly have to re-read the inodes back. So instead
don't evict the cache until after we do our last commit, this will make things a
little less crappy and makes a future enospc change work properly. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

300e4f8a

Btrfs: kill the durable block rsv stuff · 37be25bc

由 Josef Bacik 提交于 8月 05, 2011

This is confusing code and isn't used by anything anymore, so delete it.
Signed-off-by: NJosef Bacik <josef@redhat.com>

37be25bc

02 10月, 2011 4 次提交

btrfs: hooks for readahead · 4bb31e92

由 Arne Jansen 提交于 6月 10, 2011

This adds the hooks needed for readahead. In the readpage_end_io_hook,
the extent state is checked for the EXTENT_READAHEAD flag. Only in this
case the readahead hook is called, to keep the impact on non-ra as low
as possible.
Additionally, a hook for a failed IO is added, otherwise readahead would
wait indefinitely for the extent to finish.

Changes for v2:
 - eliminate race condition
Signed-off-by: NArne Jansen <sensille@gmx.net>

4bb31e92

btrfs: state information for readahead · 90519d66

由 Arne Jansen 提交于 5月 23, 2011

Add state information for readahead to btrfs_fs_info and btrfs_device

Changes v2:
 - don't wait in radix_trees
 - add own set of workers for readahead
Reviewed-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NArne Jansen <sensille@gmx.net>

90519d66

btrfs: add READAHEAD extent buffer flag · ab0fff03

由 Arne Jansen 提交于 5月 23, 2011

Add a READAHEAD extent buffer flag.
Add a function to trigger a read with this flag set.

Changes v2:
 - use extent buffer flags instead of extent state flags

Changes v5:
 - adapt to changed read_extent_buffer_pages interface
 - don't return eb from reada_tree_block_flagged if it has CORRUPT flag set
Signed-off-by: NArne Jansen <sensille@gmx.net>

ab0fff03

btrfs: add an extra wait mode to read_extent_buffer_pages · bb82ab88

由 Arne Jansen 提交于 6月 10, 2011

read_extent_buffer_pages currently has two modes, either trigger a read
without waiting for anything, or wait for the I/O to finish. The former
also bails when it's unable to lock the page. This patch now adds an
additional parameter to allow it to block on page lock, but don't wait
for completion.

Changes v5:
 - merge the 2 wait parameters into one and define WAIT_NONE, WAIT_COMPLETE and
   WAIT_PAGE_LOCK

Change v6:
 - fix bug introduced in v5
Signed-off-by: NArne Jansen <sensille@gmx.net>

bb82ab88

29 9月, 2011 1 次提交

btrfs: add mirror_num to extent_read_full_page · 8ddc7d9c

由 Jan Schmidt 提交于 6月 13, 2011

Currently, extent_read_full_page always assumes we are trying to read mirror
0, which generally is the best we can do. To add flexibility, pass it as a
parameter. This will be needed by scrub fixup code.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

8ddc7d9c

28 7月, 2011 3 次提交

Btrfs: make a lockdep class for each root · 85d4e461

由 Chris Mason 提交于 7月 26, 2011

This patch was originally from Tejun Heo. lockdep complains about the btrfs
locking because we sometimes take btree locks from two different trees at the
same time. The current classes are based only on level in the btree, which
isn't enough information for lockdep to figure out if the lock is safe.

This patch makes a class for each type of tree, and lumps all the FS trees that
actually have files and directories into the same class.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

85d4e461

Btrfs: stop using highmem for extent_buffers · a6591715

由 Chris Mason 提交于 7月 19, 2011

The extent_buffers have a very complex interface where
we use HIGHMEM for metadata and try to cache a kmap mapping
to access the memory.

The next commit adds reader/writer locks, and concurrent use
of this kmap cache would make it even more complex.

This commit drops the ability to use HIGHMEM with extent buffers,
and rips out all of the related code.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a6591715

Btrfs: use a worker thread to do caching · bab39bf9

由 Josef Bacik 提交于 6月 30, 2011

A user reported a deadlock when copying a bunch of files. This is because they
were low on memory and kthreadd got hung up trying to migrate pages for an
allocation when starting the caching kthread. The page was locked by the person
starting the caching kthread. To fix this we just need to use the async thread
stuff so that the threads are already created and we don't have to worry about
deadlocks. Thanks,
Reported-by: NRoman Mamedov <rm@romanrm.ru>
Signed-off-by: NJosef Bacik <josef@redhat.com>

bab39bf9

20 7月, 2011 1 次提交
- A
  btrfs: kill magical embedded struct superblock · 0ee5dc67
  由 Al Viro 提交于 7月 07, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  0ee5dc67
18 6月, 2011 3 次提交

btrfs: fix uninitialized return value · 35a30d7c

由 David Sterba 提交于 6月 13, 2011

When allocation fails in btrfs_read_fs_root_no_name, ret is not set
although it is returned, holding a garbage value.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

35a30d7c

btrfs: Remove unused sysfs code · 9fe6a50f

由 Maarten Lankhorst 提交于 6月 16, 2011

Removes code no longer used. The sysfs file itself is kept, because the
btrfs developers expressed interest in putting new entries to sysfs.
Signed-off-by: NMaarten Lankhorst <m.b.lankhorst@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

9fe6a50f

Btrfs: fix relocation races · 7585717f

由 Chris Mason 提交于 6月 13, 2011

The recent commit to get rid of our trans_mutex introduced
some races with block group relocation.  The problem is that relocation
needs to do some record keeping about each root, and it was relying
on the transaction mutex to coordinate things in subtle ways.

This fix adds a mutex just for the relocation code and makes sure
it doesn't have a big impact on normal operations.  The race is
really fixed in btrfs_record_root_in_trans, which is where we
step back and wait for the relocation code to finish accounting
setup.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

7585717f

13 6月, 2011 1 次提交

Btrfs: check the return value from set_anon_super · ac08aedf

由 Chris Mason 提交于 6月 13, 2011

Al Viro noticed we weren't checking for set_anon_super failures.  This
adds the required checks.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

ac08aedf