提交 · d280e5be940931c84bb2e9831ead9d02bc785484 · openeuler / Kernel

29 8月, 2012 3 次提交

Btrfs: remove superblock writing after fatal error · 68ce9682

由 Stefan Behrens 提交于 8月 01, 2012

With commit acce952b, btrfs was changed to flag the filesystem with
BTRFS_SUPER_FLAG_ERROR and switch to read-only mode after a fatal
error happened like a write I/O errors of all mirrors.
In such situations, on unmount, the superblock is written in
btrfs_error_commit_super(). This is done with the intention to be able
to evaluate the error flag on the next mount. A warning is printed
in this case during the next mount and the log tree is ignored.

The issue is that it is possible that the superblock points to a root
that was not written (due to write I/O errors).
The result is that the filesystem cannot be mounted. btrfsck also does
not start and all the other btrfs-progs tools fail to start as well.
However, mount -o recovery is working well and does the right things
to recover the filesystem (i.e., don't use the log root, clear the
free space cache and use the next mountable root that is stored in the
root backup array).

This patch removes the writing of the superblock when
BTRFS_SUPER_FLAG_ERROR is set, and removes the handling of the error
flag in the mount function.

These lines can be used to reproduce the issue (using /dev/sdm):
SCRATCH_DEV=/dev/sdm
SCRATCH_MNT=/mnt
echo 0 25165824 linear $SCRATCH_DEV 0 | dmsetup create foo
ls -alLF /dev/mapper/foo
mkfs.btrfs /dev/mapper/foo
mount /dev/mapper/foo $SCRATCH_MNT
echo bar > $SCRATCH_MNT/foo
sync
echo 0 25165824 error | dmsetup reload foo
dmsetup resume foo
ls -alF $SCRATCH_MNT
touch $SCRATCH_MNT/1
ls -alF $SCRATCH_MNT
sleep 35
echo 0 25165824 linear $SCRATCH_DEV 0 | dmsetup reload foo
dmsetup resume foo
sleep 1
umount $SCRATCH_MNT
btrfsck /dev/mapper/foo
dmsetup remove foo
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

68ce9682

Btrfs: barrier before waitqueue_active · 66657b31

由 Josef Bacik 提交于 8月 01, 2012

We need a barrir before calling waitqueue_active otherwise we will miss
wakeups.  So in places that do atomic_dec(); then atomic_read() use
atomic_dec_return() which imply a memory barrier (see memory-barriers.txt)
and then add an explicit memory barrier everywhere else that need them.
Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

66657b31

Btrfs: fix deadlock in wait_for_more_refs · 1fa11e26

由 Arne Jansen 提交于 8月 06, 2012

Commit a168650c introduced a waiting mechanism to prevent busy waiting in
btrfs_run_delayed_refs. This can deadlock with btrfs_run_ordered_operations,
where a tree_mod_seq is held while waiting for the io to complete, while
the end_io calls btrfs_run_delayed_refs.
This whole mechanism is unnecessary. If not enough runnable refs are
available to satisfy count, just return as count is more like a guideline
than a strict requirement.
In case we have to run all refs, commit transaction makes sure that no
other threads are working in the transaction anymore, so we just assert
here that no refs are blocked.
Signed-off-by: NArne Jansen <sensille@gmx.net>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

1fa11e26

26 7月, 2012 1 次提交

Btrfs: introduce subvol uuids and times · 8ea05e3a

由 Alexander Block 提交于 7月 25, 2012

This patch introduces uuids for subvolumes. Each
subvolume has it's own uuid. In case it was snapshotted,
it also contains parent_uuid. In case it was received,
it also contains received_uuid.

It also introduces subvolume ctime/otime/stime/rtime. The
first two are comparable to the times found in inodes. otime
is the origin/creation time and ctime is the change time.
stime/rtime are only valid on received subvolumes.
stime is the time of the subvolume when it was
sent. rtime is the time of the subvolume when it was
received.

Additionally to the times, we have a transid for each
time. They are updated at the same place as the times.

btrfs receive uses stransid and rtransid to find out
if a received subvolume changed in the meantime.

If an older kernel mounts a filesystem with the
extented fields, all fields become invalid. The next
mount with a new kernel will detect this and reset the
fields.
Signed-off-by: NAlexander Block <ablock84@googlemail.com>
Reviewed-by: NDavid Sterba <dave@jikos.cz>
Reviewed-by: NArne Jansen <sensille@gmx.net>
Reviewed-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
Reviewed-by: NAlex Lyakas <alex.bolshoy.btrfs@gmail.com>

8ea05e3a

24 7月, 2012 3 次提交

Btrfs: avoid I/O repair BUG() from btree_read_extent_buffer_pages() · c0901581

由 Stefan Behrens 提交于 7月 10, 2012

From btree_read_extent_buffer_pages(), currently repair_io_failure()
can be called with mirror_num being zero when submit_one_bio() returned
an error before. This used to cause a BUG_ON(!mirror_num) in
repair_io_failure() and indeed this is not a case that needs the I/O
repair code to rewrite disk blocks.
This commit prevents calling repair_io_failure() in this case and thus
avoids the BUG_ON() and malfunction.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

c0901581

Btrfs: do not ignore errors from btrfs_cleanup_fs_roots() when mounting · 44c44af2

由 Ilya Dryomov 提交于 6月 22, 2012

There used to be a BUG_ON(ret) there before EH patch (79787eaa) went in.
Bail out with EINVAL.

Cc: David Sterba <dsterba@suse.cz>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

44c44af2

I
Btrfs: do not return EINVAL instead of ENOMEM from open_ctree() · fed425c7
由 Ilya Dryomov 提交于 6月 22, 2012
```
When bailing from open_ctree() err is returned, not ret.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
fed425c7

12 7月, 2012 1 次提交

Btrfs: quota tree support and startup · bcef60f2

由 Arne Jansen 提交于 9月 13, 2011

Init the quota tree along with the others on open_ctree
and close_ctree. Add the quota tree to the list of well
known trees in btrfs_read_fs_root_no_name.
Signed-off-by: NArne Jansen <sensille@gmx.net>

bcef60f2

10 7月, 2012 3 次提交

A
Btrfs: qgroup state and initialization · 416ac51d
由 Arne Jansen 提交于 9月 13, 2011
```
Add state to fs_info.
Signed-off-by: NArne Jansen <sensille@gmx.net>
```
416ac51d

Btrfs: added helper to create new trees · 20897f5c

由 Arne Jansen 提交于 9月 13, 2011

This creates a brand new tree. Will be used to create
the quota tree.
Signed-off-by: NArne Jansen <sensille@gmx.net>

20897f5c

Btrfs: join tree mod log code with the code holding back delayed refs · 097b8a7c

由 Jan Schmidt 提交于 6月 21, 2012

We've got two mechanisms both required for reliable backref resolving (tree
mod log and holding back delayed refs). You cannot make use of one without
the other. So instead of requiring the user of this mechanism to setup both
correctly, we join them into a single interface.

Additionally, we stop inserting non-blockers into fs_info->tree_mod_seq_list
as we did before, which was of no value.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

097b8a7c

03 7月, 2012 2 次提交

Btrfs: resume balance on rw (re)mounts properly · 2b6ba629

由 Ilya Dryomov 提交于 6月 22, 2012

This introduces btrfs_resume_balance_async(), which, given that
restriper state was recovered earlier by btrfs_recover_balance(),
resumes balance in btrfs-balance kthread.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

2b6ba629

Btrfs: restore restriper state on all mounts · 68310a5e

由 Ilya Dryomov 提交于 6月 22, 2012

Fix a bug that triggered asserts in btrfs_balance() in both normal and
resume modes -- restriper state was not properly restored on read-only
mounts. This factors out resuming code from btrfs_restore_balance(),
which is now also called earlier in the mount sequence to avoid the
problem of some early writes getting the old profile.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

68310a5e

21 6月, 2012 1 次提交

Btrfs: add a missing spin_lock · e18fca73

由 Josef Bacik 提交于 6月 18, 2012

When fixing up the locking in the delayed ref destruction work I accidently
broke the locking myself ;(.  Add back a spin_lock that should be there and
we are now all set.  Thanks,
Btrfs: add a missing spin_lock

When fixing up the locking in the delayed ref destruction work I accidently
broke the locking myself ;(.  Add back a spin_lock that should be there and
we are now all set.  Thanks,
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

e18fca73

15 6月, 2012 8 次提交

Btrfs: destroy the items of the delayed inodes in error handling routine · 67cde344

由 Miao Xie 提交于 6月 14, 2012

the items of the delayed inodes were forgotten to be freed, this patch
fixes it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

67cde344

Btrfs: make sure that we've made everything in pinned tree clean · ed0eaa14

由 Liu Bo 提交于 6月 14, 2012

Since we have two trees for recording pinned extents, we need to go through
both of them to make sure that we've done everything clean.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

ed0eaa14

Btrfs: avoid memory leak of extent state in error handling routine · 6e841e32

由 Liu Bo 提交于 6月 14, 2012

We've forgotten to clear extent states in pinned tree, which will results in
space counter mismatch and memory leak:

WARNING: at fs/btrfs/extent-tree.c:7537 btrfs_free_block_groups+0x1f3/0x2e0 [btrfs]()
...
space_info 2 has 8380416 free, is not full
space_info total=12582912, used=4096, pinned=4096, reserved=0, may_use=0, readonly=4194304
btrfs state leak: start 29364224 end 29376511 state 1 in tree ffff880075f20090 refs 1
...
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

6e841e32

Btrfs: fix incompat flags setting · 69e380d1

由 Li Zefan 提交于 6月 11, 2012

It's a bug, but it happens to work, as BTRFS_COMPRESS_LZO == 2, which
has only one bit set.
Signed-off-by: NLi Zefan <lizefan@huawei.com>

69e380d1

Btrfs: use rcu to protect device->name · 606686ee

由 Josef Bacik 提交于 6月 04, 2012

Al pointed out that we can just toss out the old name on a device and add a
new one arbitrarily, so anybody who uses device->name in printk could
possibly use free'd memory. Instead of adding locking around all of this he
suggested doing it with RCU, so I've introduced a struct rcu_string that
does just that and have gone through and protected all accesses to
device->name that aren't under the uuid_mutex with rcu_read_lock(). This
protects us and I will use it for dealing with removing the device that we
used to mount the file system in a later patch. Thanks,
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NJosef Bacik <josef@redhat.com>

606686ee

Btrfs: fix btrfs_destroy_marked_extents · ee670f0a

由 Josef Bacik 提交于 5月 31, 2012

So we're forcing the eb's to have their ref count set to 1 so invalidatepage
works but this breaks lots of things, for example root nodes, and is just
plain wrong, we don't need to just evict all of this stuff. Also drop the
invalidatepage altogether and add a page_cache_release(). With this patch
we no longer hang when trying to access the root nodes after an aborted
transaction and we no longer leak memory. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

ee670f0a

Btrfs: wake up transaction waiters when aborting a transaction · d7096fc3

由 Josef Bacik 提交于 5月 31, 2012

I was getting lots of hung tasks and a NULL pointer dereference because we
are not cleaning up the transaction properly when it aborts. First we need
to reset the running_transaction to NULL so we don't get a bad dereference
for any start_transaction callers after this. Also we cannot rely on
waitqueue_active() since it's just a list_empty(), so just call wake_up()
directly since that will do the barrier for us and such. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

d7096fc3

Btrfs: fix locking in btrfs_destroy_delayed_refs · b939d1ab

由 Josef Bacik 提交于 5月 31, 2012

The transaction abort stuff was throwing warnings from the list debugging
code because we do a list_del_init outside of the delayed_refs spin lock.
The delayed refs locking makes baby Jesus cry so it's not hard to get wrong,
but we need to take the ref head mutex to make sure it's not being processed
currently, and so if it is we need to drop the spin lock and then take and
drop the mutex and do the search again. If we can take the mutex then we
can safely remove the head from the list and carry on. Now when the
transaction aborts I don't get the list debugging warnings. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

b939d1ab

30 5月, 2012 6 次提交

Btrfs: read device stats on mount, write modified ones during commit · 733f4fbb

由 Stefan Behrens 提交于 5月 25, 2012

The device statistics are written into the device tree with each
transaction commit. Only modified statistics are written.
When a filesystem is mounted, the device statistics for each involved
device are read from the device tree and used to initialize the
counters.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>

733f4fbb

Btrfs: add device counters for detected IO and checksum errors · 442a4f63

由 Stefan Behrens 提交于 5月 25, 2012

The goal is to detect when drives start to get an increased error rate,
when drives should be replaced soon. Therefore statistic counters are
added that count IO errors (read, write and flush). Additionally, the
software detected errors like checksum errors and corrupted blocks are
counted.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>

442a4f63

btrfs: Drop unused function btrfs_abort_devices() · d07eb911

由 Asias He 提交于 5月 25, 2012

1) This function is not used anywhere.

2) Using the blk_abort_queue() to abort the queue seems not correct.
blk_abort_queue() is used for timeout handling (block/blk-timeout.c).

Cc: Chris Mason <chris.mason@oracle.com>
Cc: linux-btrfs@vger.kernel.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NAsias He <asias@redhat.com>

d07eb911

Btrfs: fix how we deal with the orphan block rsv · 8a35d95f

由 Josef Bacik 提交于 5月 23, 2012

Ceph was hitting this race where we would remove an inode from the per-root
orphan list before we would release the space we had reserved for the inode.
We actually don't need a list or anything, we just need to make sure the
root doesn't try to free up the orphan reserve until after the inodes have
released their reservations. So use an atomic counter instead of a list on
the root and only decrement the counter after we've released our
reservation. I've tested this as well as several others and we no longer
see the warnings that you would see while running ceph. Thanks,
Btrfs: fix how we deal with the orphan block rsv

8a35d95f

Btrfs: convert the inode bit field to use the actual bit operations · 72ac3c0d

由 Josef Bacik 提交于 5月 23, 2012

Miao pointed this out while I was working on an orphan problem that messing
with a bitfield where different ranges are protected by different locks
doesn't work out right. Turns out we've been doing this forever where we
have different parts of the bit field protected by either no lock at all or
different locks which could cause all sorts of weird problems including the
issue I was hitting. So instead make a runtime_flags thing that we use the
normal bit operations on that are all atomic so we can keep having our
no/different locking for the different flags and then make force_compress
it's own thing so it can be treated normally. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

72ac3c0d

Btrfs: finish ordered extents in their own thread · 5fd02043

由 Josef Bacik 提交于 5月 02, 2012

We noticed that the ordered extent completion doesn't really rely on having
a page and that it could be done independantly of ending the writeback on a
page. This patch makes us not do the threaded endio stuff for normal
buffered writes and direct writes so we can end page writeback as soon as
possible (in irq context) and only start threads to do the ordered work when
it is actually done. Compression needs to be reworked some to take
advantage of this as well, but atm it has to do a find_get_page in its endio
handler so it must be done in its own thread. This makes direct writes
quite a bit faster. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

5fd02043

26 5月, 2012 2 次提交

J
Btrfs: add tree mod log to fs_info · f29021b2
由 Jan Schmidt 提交于 5月 16, 2012
```
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
```
f29021b2

Btrfs: don't set for_cow parameter for tree block functions · 5581a51a

由 Jan Schmidt 提交于 5月 16, 2012

Three callers of btrfs_free_tree_block or btrfs_alloc_tree_block passed
parameter for_cow = 1. In fact, these two functions should never mark
their tree modification operations as for_cow, because they can change
the number of blocks referenced by a tree.

Hence, we remove the extra for_cow parameter from these functions and
make them pass a zero down.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

5581a51a

06 5月, 2012 1 次提交

Btrfs: avoid sleeping in verify_parent_transid while atomic · b9fab919

由 Chris Mason 提交于 5月 06, 2012

verify_parent_transid needs to lock the extent range to make
sure no IO is underway, and so it can safely clear the
uptodate bits if our checks fail.

But, a few callers are using it with spinlocks held.  Most
of the time, the generation numbers are going to match, and
we don't want to switch to a blocking lock just for the error
case.  This adds an atomic flag to verify_parent_transid,
and changes it to return EAGAIN if it needs to block to
properly verifiy things.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b9fab919

19 4月, 2012 2 次提交

Btrfs: always store the mirror we read the eb from · 5cf1ab56

由 Josef Bacik 提交于 4月 16, 2012

A user reported a panic where we were trying to fix a bad mirror but the
mirror number we were giving was 0, which is invalid. This is because we
don't do the transid verification until after the read, so as far as the
read code is concerned the read was a success. So instead store the mirror
we read from so that if there is some failure post read we know which mirror
to try next and which mirror needs to be fixed if we find a good copy of the
block. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

5cf1ab56

Btrfs: do not mount when we have a sectorsize unequal to PAGE_SIZE · 8d082fb7

由 Liu Bo 提交于 4月 03, 2012

Our code is not ready to cope with a sectorsize that's not equal to PAGE_SIZE.
It will lead to hanging-on while writing something.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>

8d082fb7

06 4月, 2012 1 次提交
- J
  btrfs: assignment in write_dev_flush() doesn't need two semi-colons · 9c017abc
  由 Jesper Juhl 提交于 2月 27, 2012
```
One is enough.
Signed-off-by: NJesper Juhl <jj@chaosbits.net>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>
```
  9c017abc
30 3月, 2012 1 次提交

Btrfs: update the checks for mixed block groups with big metadata blocks · bc3f116f

由 Chris Mason 提交于 3月 29, 2012

Dave Sterba had put in patches to look for mixed data/metadata groups
with metadata bigger than 4KB.  But these ended up in the wrong place
and it wasn't testing the feature flag correctly.

This updates the tests to make sure our sizes are matching
Signed-off-by: NChris Mason <chris.mason@oracle.com>

bc3f116f

29 3月, 2012 3 次提交

Btrfs: flush out and clean up any block device pages during mount · 3c4bb26b

由 Chris Mason 提交于 3月 27, 2012

Btrfs puts the filesystem metadata into its own address space, and
somehow the block device address space isn't getting onto disk properly
before a mount.  The end result is that a loop of mkfs and mounting the
filesystem will sometimes find stale or incorrect data.

This commit should fix it by sprinkling fdatawrites and invalidate_bdev
calls around.  This is a short term measure to make sure it is fixed.
The block devices really should be flushed and cleaned up higher in the
stack.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

3c4bb26b

btrfs: disallow unequal data/metadata blocksize for mixed block groups · 65139ed9

由 David Sterba 提交于 2月 17, 2012

With support for bigger metadata blocks, we must avoid mounting a
filesystem with different block size for mixed block groups, this causes
corruption (found by xfstests/083).
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

65139ed9

Btrfs: enhance superblock sanity checks · fcd1f065

由 David Sterba 提交于 3月 06, 2012

Validate checksum algorithm during mount and prevent BUG_ON later in
btrfs_super_csum_size.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

fcd1f065

27 3月, 2012 2 次提交

Btrfs: deal with read errors on extent buffers differently · ea466794

由 Josef Bacik 提交于 3月 26, 2012

Since we need to read and write extent buffers in their entirety we can't use
the normal bio_readpage_error stuff since it only works on a per page basis. So
instead make it so that if we see an io error in endio we just mark the eb as
having an IO error and then in btree_read_extent_buffer_pages we will manually
try other mirrors and then overwrite the bad mirror if we find a good copy.
This works with larger than page size blocks. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

ea466794

Btrfs: don't use threaded IO completion helpers for metadata writes · f3f266ab

由 Chris Mason 提交于 3月 23, 2012

The metadata write IO completion code is now simple enough that we
don't need the threaded helpers anymore.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f3f266ab

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功