提交 · e999376f094162aa425ae749aa1df95ab928d010 · gsplhtlxg / clone-Linux

18 6月, 2011 7 次提交

Btrfs: avoid delayed metadata items during commits · e999376f

由 Chris Mason 提交于 6月 17, 2011

Snapshot creation has two phases.  One is the initial snapshot setup,
and the second is done during commit, while nobody is allowed to modify
the root we are snapshotting.

The delayed metadata insertion code can break that rule, it does a
delayed inode update on the inode of the parent of the snapshot,
and delayed directory item insertion.

This makes sure to run the pending delayed operations before we
record the snapshot root, which avoids corruptions.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e999376f

btrfs: fix uninitialized return value · 35a30d7c

由 David Sterba 提交于 6月 13, 2011

When allocation fails in btrfs_read_fs_root_no_name, ret is not set
although it is returned, holding a garbage value.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

35a30d7c

btrfs: fix wrong reservation when doing delayed inode operations · 19fd2949

由 Miao Xie 提交于 6月 15, 2011

We have migrated the space for the delayed inode items from
trans_block_rsv to global_block_rsv, but we forgot to set trans->block_rsv to
global_block_rsv when we doing delayed inode operations, and the following Oops
happened:

[ 9792.654889] ------------[ cut here ]------------
[ 9792.654898] WARNING: at fs/btrfs/extent-tree.c:5681
btrfs_alloc_free_block+0xca/0x27c [btrfs]()
[ 9792.654899] Hardware name: To Be Filled By O.E.M.
[ 9792.654900] Modules linked in: btrfs zlib_deflate libcrc32c
ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
arc4 rt61pci rt2x00pci rt2x00lib snd_hda_codec_hdmi mac80211
snd_hda_codec_realtek cfg80211 snd_hda_intel edac_core snd_seq rfkill
pcspkr serio_raw snd_hda_codec eeprom_93cx6 edac_mce_amd sp5100_tco
i2c_piix4 k10temp snd_hwdep snd_seq_device snd_pcm floppy r8169 xhci_hcd
mii snd_timer snd soundcore snd_page_alloc ipv6 firewire_ohci pata_acpi
ata_generic firewire_core pata_via crc_itu_t radeon ttm drm_kms_helper
drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
[ 9792.654919] Pid: 2762, comm: rm Tainted: G        W   2.6.39+ #1
[ 9792.654920] Call Trace:
[ 9792.654922]  [<ffffffff81053c4a>] warn_slowpath_common+0x83/0x9b
[ 9792.654925]  [<ffffffff81053c7c>] warn_slowpath_null+0x1a/0x1c
[ 9792.654933]  [<ffffffffa038e747>] btrfs_alloc_free_block+0xca/0x27c [btrfs]
[ 9792.654945]  [<ffffffffa03b8562>] ? map_extent_buffer+0x6e/0xa8 [btrfs]
[ 9792.654953]  [<ffffffffa038189b>] __btrfs_cow_block+0xfc/0x30c [btrfs]
[ 9792.654963]  [<ffffffffa0396aa6>] ? btrfs_buffer_uptodate+0x47/0x58 [btrfs]
[ 9792.654970]  [<ffffffffa0382e48>] ? read_block_for_search+0x94/0x368 [btrfs]
[ 9792.654978]  [<ffffffffa0381ba9>] btrfs_cow_block+0xfe/0x146 [btrfs]
[ 9792.654986]  [<ffffffffa03848b0>] btrfs_search_slot+0x14d/0x4b6 [btrfs]
[ 9792.654997]  [<ffffffffa03b8562>] ? map_extent_buffer+0x6e/0xa8 [btrfs]
[ 9792.655022]  [<ffffffffa03938e8>] btrfs_lookup_inode+0x2f/0x8f [btrfs]
[ 9792.655025]  [<ffffffff8147afac>] ? _cond_resched+0xe/0x22
[ 9792.655027]  [<ffffffff8147b892>] ? mutex_lock+0x29/0x50
[ 9792.655039]  [<ffffffffa03d41b1>] btrfs_update_delayed_inode+0x72/0x137 [btrfs]
[ 9792.655051]  [<ffffffffa03d4ea2>] btrfs_run_delayed_items+0x90/0xdb [btrfs]
[ 9792.655062]  [<ffffffffa039a69b>] btrfs_commit_transaction+0x228/0x654 [btrfs]
[ 9792.655064]  [<ffffffff8106e8da>] ? remove_wait_queue+0x3a/0x3a
[ 9792.655075]  [<ffffffffa03a2fa5>] btrfs_evict_inode+0x14d/0x202 [btrfs]
[ 9792.655077]  [<ffffffff81132bd6>] evict+0x71/0x111
[ 9792.655079]  [<ffffffff81132de0>] iput+0x12a/0x132
[ 9792.655081]  [<ffffffff8112aa3a>] do_unlinkat+0x106/0x155
[ 9792.655083]  [<ffffffff81127b83>] ? path_put+0x1f/0x23
[ 9792.655085]  [<ffffffff8109c53c>] ? audit_syscall_entry+0x145/0x171
[ 9792.655087]  [<ffffffff81128410>] ? putname+0x34/0x36
[ 9792.655090]  [<ffffffff8112b441>] sys_unlinkat+0x29/0x2b
[ 9792.655092]  [<ffffffff81482c42>] system_call_fastpath+0x16/0x1b
[ 9792.655093] ---[ end trace 02b696eb02b3f768 ]---

This patch fix it by setting the reservation of the transaction handle to the
correct one.
Reported-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

19fd2949

btrfs: Remove unused sysfs code · 9fe6a50f

由 Maarten Lankhorst 提交于 6月 16, 2011

Removes code no longer used. The sysfs file itself is kept, because the
btrfs developers expressed interest in putting new entries to sysfs.
Signed-off-by: NMaarten Lankhorst <m.b.lankhorst@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

9fe6a50f

btrfs: fix dereference of ERR_PTR value · 3ed4498c

由 David Sterba 提交于 6月 13, 2011

smatch reports:

btrfs_recover_log_trees error: 'wc.replay_dest' dereferencing
possible ERR_PTR()
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

3ed4498c

Merge branch 'for-chris' of... · e038dca8

由 Chris Mason 提交于 6月 17, 2011

Merge branch 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-work into for-linus

Conflicts:
	fs/btrfs/transaction.c
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e038dca8

Btrfs: fix relocation races · 7585717f

由 Chris Mason 提交于 6月 13, 2011

The recent commit to get rid of our trans_mutex introduced
some races with block group relocation.  The problem is that relocation
needs to do some record keeping about each root, and it was relying
on the transaction mutex to coordinate things in subtle ways.

This fix adds a mutex just for the relocation code and makes sure
it doesn't have a big impact on normal operations.  The race is
really fixed in btrfs_record_root_in_trans, which is where we
step back and wait for the relocation code to finish accounting
setup.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

7585717f

16 6月, 2011 3 次提交

Btrfs: set no_trans_join after trying to expand the transaction · ed0ca140

由 Josef Bacik 提交于 6月 14, 2011

We can lockup if we try to allow new writers join the transaction and we have
flushoncommit set or have a pending snapshot. This is because we set
no_trans_join and then loop around and try to wait for ordered extents again.
The problem is the ordered endio stuff needs to join the transaction, which it
can't do because no_trans_join is set. So instead wait until after this loop to
set no_trans_join and then make sure to wait for num_writers == 1 in case
anybody got started in between us exiting the loop and setting no_trans_join.
This could easily be reproduced by mounting -o flushoncommit and running xfstest
13. It cannot be reproduced with this patch. Thanks,
Reported-by: NJim Schutt <jaschut@sandia.gov>
Signed-off-by: NJosef Bacik <josef@redhat.com>

ed0ca140

Btrfs: protect the pending_snapshots list with trans_lock · 8351583e

由 Josef Bacik 提交于 6月 14, 2011

Currently there is nothing protecting the pending_snapshots list on the
transaction. We only hold the directory mutex that we are snapshotting and a
read lock on the subvol_sem, so we could race with somebody else creating a
snapshot in a different directory and end up with list corruption. So protect
this list with the trans_lock. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

8351583e

Btrfs: fix path leakage on subvol deletion · 71d7aed0

由 Josef Bacik 提交于 6月 14, 2011

The delayed ref patch accidently removed the btrfs_free_path in
btrfs_unlink_subvol, this puts it back and means we don't leak a path. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

71d7aed0

13 6月, 2011 2 次提交

Btrfs: drop the delalloc_bytes check in shrink_delalloc · f4c44016

由 Chris Mason 提交于 6月 13, 2011

Even when delalloc_bytes is zero, we may need to sleep while waiting
for delalloc space.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f4c44016

Btrfs: check the return value from set_anon_super · ac08aedf

由 Chris Mason 提交于 6月 13, 2011

Al Viro noticed we weren't checking for set_anon_super failures.  This
adds the required checks.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

ac08aedf

11 6月, 2011 9 次提交

Btrfs: use join_transaction in btrfs_evict_inode() · 30b4caf5

由 Li Zefan 提交于 6月 08, 2011

The WARN_ON() in start_transaction() was triggered while balancing.

The cause is btrfs_relocate_chunk() started a transaction and
then called iput() on the inode that stores free space cache,
and iput() called btrfs_start_transaction() again.
Reported-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Reviewed-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

30b4caf5

Btrfs - use %pU to print fsid · 22b63a29

由 Ilya Dryomov 提交于 2月 09, 2011

Get rid of FIXME comment.  Uuids from dmesg are now the same as uuids
given by btrfs-progs.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

22b63a29

Btrfs: fix extent state leak on failed nodatasum reads · 08d2f347

由 Jan Schmidt 提交于 5月 04, 2011

When encountering an EIO while reading from a nodatasum extent, we
insert an error record into the inode's failure tree.
btrfs_readpage_end_io_hook returns early for nodatasum inodes. We'd
better clear the failure tree in that case, otherwise the kernel
complains about

	BUG extent_state: Objects remaining on kmem_cache_close()

on rmmod.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

08d2f347

Merge branch 'for-chris' of... · 0e735872

由 Chris Mason 提交于 6月 10, 2011

Merge branch 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/arne/btrfs-unstable-arne into for-linus

0e735872

btrfs: fix unlocked access of delalloc_inodes · 5be76758

由 David Sterba 提交于 6月 09, 2011

list_splice_init will make delalloc_inodes empty, but without a spinlock
around, this may produce corrupted list head, accessed in many placess,
The race window is very tight and nobody seems to have hit it so far.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

5be76758

Btrfs: avoid stack bloat in btrfs_ioctl_fs_info() · 027ed2f0

由 Li Zefan 提交于 6月 08, 2011

The size of struct btrfs_ioctl_fs_info_args is as big as 1KB, so
don't declare the variable on stack.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Reviewed-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

027ed2f0

btrfs: remove 64bit alignment padding to allow extent_buffer to fit into one fewer cacheline · 9eb9104c

由 richard kennedy 提交于 6月 07, 2011

Reorder extent_buffer to remove 8 bytes of alignment padding on 64 bit
builds. This shrinks its size to 128 bytes allowing it to fit into one
fewer cache lines and allows more objects per slab in its kmem_cache.

slabinfo extent_buffer reports :-

 before:-
    Sizes (bytes)     Slabs
    ----------------------------------
    Object :     136  Total  :     123
    SlabObj:     136  Full   :     121
    SlabSiz:    4096  Partial:       0
    Loss   :       0  CpuSlab:       2
    Align  :       8  Objects:      30

 after :-
    Object :     128  Total  :       4
    SlabObj:     128  Full   :       2
    SlabSiz:    4096  Partial:       0
    Loss   :       0  CpuSlab:       2
    Align  :       8  Objects:      32
Signed-off-by: NRichard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

9eb9104c

Btrfs: clear current->journal_info on async transaction commit · 38e88054

由 Sage Weil 提交于 6月 10, 2011

Normally current->jouranl_info is cleared by commit_transaction.  For an
async snap or subvol creation, though, it runs in a work queue.  Clear
it in btrfs_commit_transaction_async() to avoid leaking a non-NULL
journal_info when we return to userspace.  When the actual commit runs in
the other thread it won't care that it's current->journal_info is already
NULL.
Signed-off-by: NSage Weil <sage@newdream.net>
Tested-by: NJim Schutt <jaschut@sandia.gov>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

38e88054

Btrfs: make sure to recheck for bitmaps in clusters · 38e87880

由 Chris Mason 提交于 6月 10, 2011

Josef recently changed the free extent cache to look in
the block group cluster for any bitmaps before trying to
add a new bitmap for the same offset.  This avoids BUG_ON()s due
covering duplicate ranges.

But it didn't go quite far enough.  A given free range might span
between one or more bitmaps or free space entries.  The code has
looping to cover this, but it doesn't check for clustered bitmaps
every time.

This shuffles our gotos to check for a bitmap in the cluster
for every new bitmap entry we try to add.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

38e87880

10 6月, 2011 4 次提交

A
btrfs: remove unneeded includes from scrub.c · 6eef3125
由 Arne Jansen 提交于 6月 10, 2011
```
Signed-off-by: NArne Jansen <sensille@gmx.net>
```
6eef3125

btrfs: reinitialize scrub workers · 632dd772

由 Arne Jansen 提交于 6月 10, 2011

Scrub starts the workers each time a scrub starts and stops them after it
finished. This patch adds an initialization for the workers before each
start, otherwise the workers behave strangely.
Signed-off-by: NArne Jansen <sensille@gmx.net>

632dd772

btrfs: scrub: errors in tree enumeration · 8c51032f

由 Arne Jansen 提交于 6月 03, 2011

due to the semantics of btrfs_search_slot the path can point to an
invalid slot when ret > 0. This condition went unnoticed, which in
turn could have led to an incomplete scrubbing.
Signed-off-by: NArne Jansen <sensille@gmx.net>

8c51032f

Btrfs: don't map extent buffer if path->skip_locking is set · ad3e34bb

由 Josef Bacik 提交于 6月 08, 2011

Arne's scrub stuff exposed a problem with mapping the extent buffer in
reada_for_search. He searches the commit root with multiple threads and with
skip_locking set, so we can race and overwrite node->map_token since node isn't
locked. So fix this so that we only map the extent buffer if we don't already
have a map_token and skip_locking isn't set. Without this patch scrub would
panic almost immediately, with the patch it doesn't panic anymore. Thanks,
Reported-by: NArne Jansen <sensille@gmx.net>
Signed-off-by: NJosef Bacik <josef@redhat.com>

ad3e34bb

09 6月, 2011 8 次提交

Btrfs: unlock the trans lock properly · 3473f3c0

由 Josef Bacik 提交于 6月 09, 2011

In btrfs_wait_for_commit if we came upon a transaction that had committed we
just exited, but that's bad since we are holding the trans_lock.  So break
instead so that the lock is dropped.  Thanks,
Reported-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NJosef Bacik <josef@redhat.com>

3473f3c0

Btrfs: don't map extent buffer if path->skip_locking is set · 25b8b936

由 Josef Bacik 提交于 6月 08, 2011

25b8b936

Btrfs: fix duplicate checking logic · f6a39829

由 Josef Bacik 提交于 6月 06, 2011

When merging my code into the integration test the second check for duplicate
entries got screwed up. This patch fixes it by dropping ret2 and just using ret
for the return value, and checking if we got an error before adding the bitmap
to the local list. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

f6a39829

Btrfs: fix the allocator loop logic · 723bda20

由 Josef Bacik 提交于 5月 27, 2011

I was testing with empty_cluster = 0 to try and reproduce a problem and kept
hitting early enospc panics.  This was because our loop logic was a little
confused.  So this is what I did

1) Make the loop variable the ultimate decider on wether we should loop again
isntead of checking to see if we had an uncached bg, empty size or empty
cluster.

2) Increment loop before checking to see what we are on to make the loop
definitions make more sense.

3) If we are on the chunk alloc loop don't set empty_size/empty_cluster to 0
unless we didn't actually allocate a chunk.  If we did allocate a chunk we
should be able to easily setup a new cluster so clearing
empty_size/empty_cluster makes us less efficient.

This kept me from hitting panics while trying to reproduce the other problem.
Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

723bda20

Btrfs: fix bitmap regression · 2cdc342c

由 Josef Bacik 提交于 5月 27, 2011

In cleaning up the clustering code I accidently introduced a regression by
adding bitmap entries to the cluster rb tree. The problem is if we've maxed out
the number of bitmaps we can have for the block group we can only add free space
to the bitmaps, but since the bitmap is on the cluster we can't find it and we
try to create another one. This would result in a panic because the total
bitmaps was bigger than the max bitmaps that were allowed. This patch fixes
this by checking to see if we have a cluster, and then looking at the cluster rb
tree to see if it has a bitmap entry and if it does and that space belongs to
that bitmap, go ahead and add it to that bitmap.

I could hit this panic every time with an fs_mark test within a couple of
minutes. With this patch I no longer hit the panic and fs_mark goes to
completion. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

2cdc342c

Btrfs: don't commit the transaction if we dont have enough pinned bytes · f2bb8f5c

由 Josef Bacik 提交于 5月 25, 2011

I noticed when running an enospc test that we would get stuck committing the
transaction in check_data_space even though we truly didn't have enough space.
So check to see if bytes_pinned is bigger than num_bytes, if it's not don't
commit the transaction. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

f2bb8f5c

Btrfs: noinline the cluster searching functions · 3de85bb9

由 Josef Bacik 提交于 5月 25, 2011

When profiling the find cluster code it's hard to tell where we are spending our
time because the bitmap and non-bitmap functions get inlined by the compiler, so
make that not happen.  Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

3de85bb9

Btrfs: cache bitmaps when searching for a cluster · 86d4a77b

由 Josef Bacik 提交于 5月 25, 2011

If we are looking for a cluster in a particularly sparse or fragmented block
group, we will do a lot of looping through the free space tree looking for
various things, and if we need to look at bitmaps we will endup doing the whole
dance twice. So instead add the bitmap entries to a temporary list so if we
have to do the bitmap search we can just look through the list of entries we've
found quickly instead of having to loop through the entire tree again. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

86d4a77b

04 6月, 2011 7 次提交

btrfs: fix uninitialized variable warning · aa0467d8

由 David Sterba 提交于 6月 03, 2011

With Linus' tree, today's linux-next build (powercp ppc64_defconfig)
produced this warning:

fs/btrfs/delayed-inode.c: In function 'btrfs_delayed_update_inode':
fs/btrfs/delayed-inode.c:1598:6: warning: 'ret' may be used
uninitialized in this function

Introduced by commit 16cdcec7 ("btrfs: implement delayed inode items
operation").

This fixes a bug in btrfs_update_inode(): if the returned value from
btrfs_delayed_update_inode is a nonzero garbage, inode stat data are not
updated and several call paths may hit a BUG_ON or fail with strange
code.
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

aa0467d8

btrfs: add helper for fs_info->closing · 7841cb28

由 David Sterba 提交于 5月 31, 2011

wrap checking of filesystem 'closing' flag and fix a few missing memory
barriers.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

7841cb28

Btrfs: add mount -o inode_cache · 4b9465cb

由 Chris Mason 提交于 6月 03, 2011

This makes the inode map cache default to off until we
fix the overflow problem when the free space crcs don't fit
inside a single page.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

4b9465cb

btrfs: scrub: add explicit plugging · e7786c3a

由 Arne Jansen 提交于 5月 28, 2011

With the removal of the implicit plugging scrub ends up doing more and
smaller I/O than necessary. This patch adds explicit plugging per chunk.
Signed-off-by: NArne Jansen <sensille@gmx.net>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e7786c3a

btrfs: use btrfs_ino to access inode number · a4689d2b

由 David Sterba 提交于 5月 31, 2011

commit 4cb5300b ("Btrfs: add mount -o auto_defrag") accesses inode
number directly while it should use the helper with the new inode
number allocator.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a4689d2b

Btrfs: don't save the inode cache if we are deleting this root · d132a538

由 Josef Bacik 提交于 5月 31, 2011

With xfstest 254 I can panic the box every time with the inode number caching
stuff on. This is because we clean the inodes out when we delete the subvolume,
but then we write out the inode cache which adds an inode to the subvolume inode
tree, and then when it gets evicted again the root gets added back on the dead
roots list and is deleted again, so we have a double free. To stop this from
happening just return 0 if refs is 0 (and we're not the tree root since tree
root always has refs of 0). With this fix 254 no longer panics. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Tested-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d132a538

btrfs: false BUG_ON when degraded · 5f3f302a

由 Arne Jansen 提交于 5月 30, 2011

In degraded mode the struct btrfs_device of missing devs don't have
device->name set. A kstrdup of NULL correctly returns NULL. Don't
BUG in this case.
Signed-off-by: NArne Jansen <sensille@gmx.net>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

5f3f302a