提交 · 9c5085c147989d48dfe74194b48affc23f376650 · xiphi1978 / linux

15 6月, 2012 8 次提交

Btrfs: implement ->show_devname · 9c5085c1

由 Josef Bacik 提交于 6月 05, 2012

Because btrfs can remove the device that was mounted we need to have a
->show_devname so that in this case we can print out some other device in
the file system to /proc/mount.  So if there are multiple devices in a btrfs
file system we will just print the device with the lowest devid that we can
find.  This will make everything consistent and deal with device removal
properly.  The drawback is if you mount with a device that is higher than
the lowest devicd it won't show up as the mounted device in /proc/mounts,
but this is a small price to pay. This was inspired by Miao Xie's patch.
Thanks,
Reviewed-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <josef@redhat.com>

9c5085c1

Btrfs: use rcu to protect device->name · 606686ee

由 Josef Bacik 提交于 6月 04, 2012

Al pointed out that we can just toss out the old name on a device and add a
new one arbitrarily, so anybody who uses device->name in printk could
possibly use free'd memory. Instead of adding locking around all of this he
suggested doing it with RCU, so I've introduced a struct rcu_string that
does just that and have gone through and protected all accesses to
device->name that aren't under the uuid_mutex with rcu_read_lock(). This
protects us and I will use it for dealing with removing the device that we
used to mount the file system in a later patch. Thanks,
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NJosef Bacik <josef@redhat.com>

606686ee

Btrfs: unlock everything properly in the error case for nocow · 17ca04af

由 Josef Bacik 提交于 5月 31, 2012

I was getting hung on umount when a transaction was aborted because a range
of one of the free space inodes was still locked. This is because the nocow
stuff doesn't unlock anything on error. This fixed the problem and I
verified that is what was happening. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

17ca04af

Btrfs: fix btrfs_destroy_marked_extents · ee670f0a

由 Josef Bacik 提交于 5月 31, 2012

So we're forcing the eb's to have their ref count set to 1 so invalidatepage
works but this breaks lots of things, for example root nodes, and is just
plain wrong, we don't need to just evict all of this stuff. Also drop the
invalidatepage altogether and add a page_cache_release(). With this patch
we no longer hang when trying to access the root nodes after an aborted
transaction and we no longer leak memory. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

ee670f0a

Btrfs: abort the transaction if the commit fails · 7b8b92af

由 Josef Bacik 提交于 5月 31, 2012

If a transaction commit fails we don't abort it so we don't set an error on
the file system. This patch fixes that by actually calling the abort stuff
and then adding a check for a fs error in the transaction start stuff to
make sure it is caught properly. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

7b8b92af

Btrfs: wake up transaction waiters when aborting a transaction · d7096fc3

由 Josef Bacik 提交于 5月 31, 2012

I was getting lots of hung tasks and a NULL pointer dereference because we
are not cleaning up the transaction properly when it aborts. First we need
to reset the running_transaction to NULL so we don't get a bad dereference
for any start_transaction callers after this. Also we cannot rely on
waitqueue_active() since it's just a list_empty(), so just call wake_up()
directly since that will do the barrier for us and such. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

d7096fc3

Btrfs: fix locking in btrfs_destroy_delayed_refs · b939d1ab

由 Josef Bacik 提交于 5月 31, 2012

The transaction abort stuff was throwing warnings from the list debugging
code because we do a list_del_init outside of the delayed_refs spin lock.
The delayed refs locking makes baby Jesus cry so it's not hard to get wrong,
but we need to take the ref head mutex to make sure it's not being processed
currently, and so if it is we need to drop the spin lock and then take and
drop the mutex and do the search again. If we can take the mutex then we
can safely remove the head from the list and carry on. Now when the
transaction aborts I don't get the list debugging warnings. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

b939d1ab

Btrfs: pass locked_page into extent_clear_unlock_delalloc if theres an error · beb42dd7

由 Josef Bacik 提交于 5月 30, 2012

While doing my enospc work I got a transaction abortion that resulted in a
panic when we tried to unlock_page() an already unlocked page.  This is
because we aren't calling extent_clear_unlock_delalloc with the locked page
so it was unlocking all the pages in the range.  This is wrong since
__extent_writepage expects to have the page locked still unless we return
*page_started as 1.  This should keep us from panicing.  Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

beb42dd7

01 6月, 2012 5 次提交

Btrfs: fix tree mod log rewinded level and rewinding of moved keys · c3193108

由 Jan Schmidt 提交于 5月 31, 2012

When we rewind REMOVE_WHILE_FREEING operations, there's code that allocates
a fresh buffer instead of cloning the old one. Setting that buffer's level
correctly was missing in this case.

When rewinding a MOVE_KEYS operation, btrfs_node_key_ptr_offset(slot) was
missing for memmove_extent_buffer()'s arguments.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

c3193108

Btrfs: fix tree mod log del_ptr · f395694c

由 Jan Schmidt 提交于 5月 31, 2012

Logging for del_ptr when we're not deleting the last pointer was wrong. This
fixes both, duplicate log entries and log sequence.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

f395694c

Btrfs: add tree_mod_dont_log helper · e9b7fd4d

由 Jan Schmidt 提交于 5月 31, 2012

Replace duplicate code by small inline helper function.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

e9b7fd4d

Btrfs: add missing spin_lock for insertion into tree mod log · 926dd8a6

由 Jan Schmidt 提交于 5月 31, 2012

tree_mod_alloc calls __get_tree_mod_seq and must acquire a spinlock before
doing so.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

926dd8a6

Btrfs: add inodes before dropping the extent lock in find_all_leafs · 3301958b

由 Jan Schmidt 提交于 5月 30, 2012

We must build up the inode list with the extent lock held after following
indirect refs.

This also requires an extension to ulists, which allows to modify the stored
aux value in case a key already exists in the list.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

3301958b

31 5月, 2012 1 次提交

Btrfs: use delayed ref sequence numbers for all fs-tree updates · 95a06077

由 Jan Schmidt 提交于 5月 29, 2012

The sequence number for delayed refs is needed to postpone certain delayed
refs for a very short period while walking backrefs. Before the tree
modification log, we thought we'd only have to hold back those references
that don't have a counter operation.

While now we've the tree mod log, we're rewinding fs tree blocks to a
defined consistent state. We cannot know in advance for which tree block
we'll be doing rewind operations later. Therefore, we must postpone all the
delayed refs for fs-tree blocks, even those having a counter operation.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

95a06077

30 5月, 2012 26 次提交

Btrfs: fix false positive in check-integrity on unmount · 48235a68

由 Stefan Behrens 提交于 5月 23, 2012

During unmount, it could happen that the integrity checker printed a
warning message "attempt to free ... on umount which is not yet iodone"
which turned out to be a false positive.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>

48235a68

Btrfs: fix runtime warning in check-integrity check data mode · 86ff7ffc

由 Stefan Behrens 提交于 4月 24, 2012

If a file_extent_item was located at the very end of a leaf and there was
not enough space to hold a full item, but there was enough space to hold
one of type BTRFS_FILE_EXTENT_INLINE or PREALLOC, and it was only such a
short item, a warning was printed anyway. This check is now fixed.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>

86ff7ffc

Btrfs: set ioprio of scrub readahead to idle · 3d136a11

由 Stefan Behrens 提交于 2月 03, 2012

Reduce ioprio class of scrub readahead threads to idle priority.
This setting is fixed. This priority has shown the best performance
during all measurements.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>

3d136a11

Btrfs: fix return code in drop_objectid_items · 5bdbeb21

由 Josef Bacik 提交于 5月 29, 2012

So dpkg fsync()'s the file and the directory containing the file whenever it
writes to a file which is really slow in btrfs. This is partly because
fsync()'ing a directory _always_ committed the transaction instead of just
going to the tree log. This is because drop_objectid_items() would return 1
since it does a btrfs_search_slot() which returns 1. In tree-log jargon
this means that we have to commit the transaction to be safe. So just check
if ret is greater than 0 and set it to 0 if it does. With this patch we now
use the tree-log instead of committing the entire transaction, which is
twice as fast on my box. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

5bdbeb21

Btrfs: check to see if the inode is in the log before fsyncing · 22ee6985

由 Josef Bacik 提交于 5月 29, 2012

We have this check down in the actual logging code, but this is after we
start a transaction and all that good stuff. So move the helper
inode_in_log() out so we can call it in fsync() and avoid starting a
transaction altogether and just exit if we've already fsync()'ed this file
recently. You would notice this issue if you fsync()'ed a file over and
over again until the transaction committed. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

22ee6985

Btrfs: return value of btrfs_read_buffer is checked correctly · 018642a1

由 Tsutomu Itoh 提交于 5月 29, 2012

btrfs_read_buffer() has the possibility of returning the error.
Therefore, I add the code in which the return value of btrfs_read_buffer()
is checked.
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>

018642a1

Btrfs: read device stats on mount, write modified ones during commit · 733f4fbb

由 Stefan Behrens 提交于 5月 25, 2012

The device statistics are written into the device tree with each
transaction commit. Only modified statistics are written.
When a filesystem is mounted, the device statistics for each involved
device are read from the device tree and used to initialize the
counters.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>

733f4fbb

Btrfs: add ioctl to get and reset the device stats · c11d2c23

由 Stefan Behrens 提交于 5月 25, 2012

An ioctl interface is added to get the device statistic counters.
A second ioctl is added to atomically get and reset these counters.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>

c11d2c23

Btrfs: add device counters for detected IO and checksum errors · 442a4f63

由 Stefan Behrens 提交于 5月 25, 2012

The goal is to detect when drives start to get an increased error rate,
when drives should be replaced soon. Therefore statistic counters are
added that count IO errors (read, write and flush). Additionally, the
software detected errors like checksum errors and corrupted blocks are
counted.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>

442a4f63

btrfs: Drop unused function btrfs_abort_devices() · d07eb911

由 Asias He 提交于 5月 25, 2012

1) This function is not used anywhere.

2) Using the blk_abort_queue() to abort the queue seems not correct.
blk_abort_queue() is used for timeout handling (block/blk-timeout.c).

Cc: Chris Mason <chris.mason@oracle.com>
Cc: linux-btrfs@vger.kernel.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NAsias He <asias@redhat.com>

d07eb911

Btrfs: fix the same inode id problem when doing auto defragment · 762f2263

由 Miao Xie 提交于 5月 24, 2012

Two files in the different subvolumes may have the same inode id, so
The rb-tree which is used to manage the defragment object must take it
into account. This patch fix this problem.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>

762f2263

Btrfs: fall back to non-inline if we don't have enough space · 2adcac1a

由 Josef Bacik 提交于 5月 23, 2012

If cow_file_range_inline fails with ENOSPC we abort the transaction which
isn't very nice. This really shouldn't be happening anyways but there's no
sense in making it a horrible error when we can easily just go allocate
normal data space for this stuff. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

2adcac1a

Btrfs: fix how we deal with the orphan block rsv · 8a35d95f

由 Josef Bacik 提交于 5月 23, 2012

Ceph was hitting this race where we would remove an inode from the per-root
orphan list before we would release the space we had reserved for the inode.
We actually don't need a list or anything, we just need to make sure the
root doesn't try to free up the orphan reserve until after the inodes have
released their reservations. So use an atomic counter instead of a list on
the root and only decrement the counter after we've released our
reservation. I've tested this as well as several others and we no longer
see the warnings that you would see while running ceph. Thanks,
Btrfs: fix how we deal with the orphan block rsv

8a35d95f

Btrfs: convert the inode bit field to use the actual bit operations · 72ac3c0d

由 Josef Bacik 提交于 5月 23, 2012

Miao pointed this out while I was working on an orphan problem that messing
with a bitfield where different ranges are protected by different locks
doesn't work out right. Turns out we've been doing this forever where we
have different parts of the bit field protected by either no lock at all or
different locks which could cause all sorts of weird problems including the
issue I was hitting. So instead make a runtime_flags thing that we use the
normal bit operations on that are all atomic so we can keep having our
no/different locking for the different flags and then make force_compress
it's own thing so it can be treated normally. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

72ac3c0d

Btrfs: merge contigous regions when loading free space cache · cd023e7b

由 Josef Bacik 提交于 5月 14, 2012

When we write out the free space cache we will write out everything that is
in our in memory tree, and then we will just walk the pinned extents tree
and write anything we see there. The problem with this is that during
normal operations the pinned extents will be merged back into the free space
tree normally, and then we can allocate space from the merged areas and
commit them to the tree log. If we crash and replay the tree log we will
crash again because the tree log will try to free up space from what looks
like 2 seperate but contiguous entries, since one entry is from the original
free space cache and the other was a pinned extent that was merged back. To
fix this we just need to walk the free space tree after we load it and merge
contiguous entries back together. This will keep the tree log stuff from
breaking and it will make the allocator behave more nicely. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

cd023e7b

Btrfs: do not do balance in readonly mode · 9ba1f6e4

由 Liu Bo 提交于 5月 11, 2012

In normal cases, we would not be allowed to do balance in RO mode.
However, when we're using a seeding device and adding another device to sprout,
things will change:

$ mkfs.btrfs /dev/sdb7
$ btrfstune -S 1 /dev/sdb7
$ mount /dev/sdb7 /mnt/btrfs -o ro
$ btrfs fi bal /mnt/btrfs   -----------------------> fail.
$ btrfs dev add /dev/sdb8 /mnt/btrfs
$ btrfs fi bal /mnt/btrfs   -----------------------> works!

It should not be designed as an exception, and we'd better add another check for
mnt flags.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Reviewed-by: NJosef Bacik <josef@redhat.com>

9ba1f6e4

Btrfs: use fastpath in extent state ops as much as possible · d1ac6e41

由 Liu Bo 提交于 5月 10, 2012

Fully utilize our extent state's new helper functions to use
fastpath as much as possible.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Reviewed-by: NJosef Bacik <josef@redhat.com>

d1ac6e41

Btrfs: fix wrong error returned by adding a device · f8c5d0b4

由 Liu Bo 提交于 5月 10, 2012

Reproduce:
$ mkfs.btrfs /dev/sdb7
$ mount /dev/sdb7 /mnt/btrfs -o ro
$ btrfs dev add /dev/sdb8 /mnt/btrfs
ERROR: error adding the device '/dev/sdb8' - Invalid argument

Since we mount with readonly options, and /dev/sdb7 is not a seeding one,
a readonly notification is preferred.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Reviewed-by: NJosef Bacik <josef@redhat.com>

f8c5d0b4

Btrfs: finish ordered extents in their own thread · 5fd02043

由 Josef Bacik 提交于 5月 02, 2012

We noticed that the ordered extent completion doesn't really rely on having
a page and that it could be done independantly of ending the writeback on a
page. This patch makes us not do the threaded endio stuff for normal
buffered writes and direct writes so we can end page writeback as soon as
possible (in irq context) and only start threads to do the ordered work when
it is actually done. Compression needs to be reworked some to take
advantage of this as well, but atm it has to do a find_get_page in its endio
handler so it must be done in its own thread. This makes direct writes
quite a bit faster. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

5fd02043

Btrfs: do not check delalloc when updating disk_i_size · 4e899152

由 Josef Bacik 提交于 5月 02, 2012

We are checking delalloc to see if it is ok to update the i_size.  There are
2 cases it stops us from updating

1) If there is delalloc between our current disk_i_size and this ordered
extent

2) If there is delalloc between our current ordered extent and the next
ordered extent

These tests are racy however since we can set delalloc for these ranges at
any time.  Also for the first case if we notice there is delalloc between
disk_i_size and our ordered extent we will not update disk_i_size and assume
that when that delalloc bit gets written out it will update everything
properly.  However if we crash before that we will have file extents outside
of our i_size, which is not good, so this test is dangerous as well as racy.
Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

4e899152

Btrfs: avoid buffer overrun in mount option handling · f60d16a8

由 Jim Meyering 提交于 4月 25, 2012

There is an off-by-one error: allocating room for a maximal result
string but without room for a trailing NUL.  That, can lead to
returning a transformed string that is not NUL-terminated, and
then to a caller reading beyond end of the malloc'd buffer.

Rewrite to s/kzalloc/kmalloc/, remove unwarranted use of strncpy
(the result is guaranteed to fit), remove dead strlen at end, and
change a few variable names and comments.
Reviewed-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NJim Meyering <meyering@redhat.com>

f60d16a8

Btrfs: NUL-terminate path buffer in DEV_INFO ioctl result · a27202fb

由 Jim Meyering 提交于 4月 26, 2012

A device with name of length BTRFS_DEVICE_PATH_NAME_MAX or longer
would not be NUL-terminated in the DEV_INFO ioctl result buffer.
Signed-off-by: NJim Meyering <meyering@redhat.com>

a27202fb

Btrfs: avoid buffer overrun in btrfs_printk · f07c9a79

由 Jim Meyering 提交于 4月 26, 2012

The buffer read-overrun would be triggered by a printk format
starting with <N>, where N is a single digit.  NUL-terminate
after strncpy.  Use memcpy, not strncpy, since we know the
string we're copying fits in the destination buffer and
contains no NUL byte.
Signed-off-by: NJim Meyering <meyering@redhat.com>

f07c9a79

Fix minor type issues · 2eec6c81

由 Daniel J Blueman 提交于 4月 26, 2012

Address some minor type issues identified by sparse checker.
Signed-off-by: NDaniel J Blueman <daniel@quora.org>

2eec6c81

btrfs: allow changing 'thread_pool' size at remount time · 0d2450ab

由 Sergei Trofimovich 提交于 4月 24, 2012

Changing 'mount -oremount,thread_pool=2 /' didn't make any effect:

maximum amount of worker threads is specified in 2 places:
- in 'strict btrfs_fs_info::thread_pool_size'
- in each worker struct: 'struct btrfs_workers::max_workers'

'mount -oremount' updated only 'btrfs_fs_info::thread_pool_size'.

Fix it by pushing new maximum value to all created worker structures
as well.

Cc: Josef Bacik <josef@redhat.com>
Cc: Chris Mason <chris.mason@oracle.com>
Reviewed-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NSergei Trofimovich <slyfox@gentoo.org>

0d2450ab

Btrfs: do not do filemap_write_and_wait_range in fsync · 0885ef5b

由 Josef Bacik 提交于 4月 23, 2012

We already do the btrfs_wait_ordered_range which will do this for us, so
just remove this call so we don't call it twice.  Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

0885ef5b