提交 · ba2d084055fd3f67af120070f5620173efd867c8 · openeuler / Kernel

21 1月, 2016 1 次提交
- D
  btrfs: sysfs: fix typo in compat_ro attribute definition · ba2d0840
  由 David Sterba 提交于 1月 20, 2016
```
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
  ba2d0840
12 1月, 2016 1 次提交

Merge branch 'for-chris-4.5' of... · 988f1f57

由 Chris Mason 提交于 1月 11, 2016

Merge branch 'for-chris-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.5
Signed-off-by: NChris Mason <clm@fb.com>

988f1f57

11 1月, 2016 2 次提交

Merge branch 'misc-cleanups-4.5' of... · b28cf572

由 Chris Mason 提交于 1月 11, 2016

Merge branch 'misc-cleanups-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.5
Signed-off-by: NChris Mason <clm@fb.com>

b28cf572

Merge branch 'misc-for-4.5' of... · a3058101

由 Chris Mason 提交于 1月 11, 2016

Merge branch 'misc-for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.5

a3058101

08 1月, 2016 1 次提交

Btrfs: fix fitrim discarding device area reserved for boot loader's use · 8cdc7c5b

由 Filipe Manana 提交于 1月 06, 2016

As of the 4.3 kernel release, the fitrim ioctl can now discard any region
of a disk that is not allocated to any chunk/block group, including the
first megabyte which is used for our primary superblock and by the boot
loader (grub for example).

Fix this by not allowing to trim/discard any region in the device starting
with an offset not greater than min(alloc_start_mount_option, 1Mb), just
as it was not possible before 4.3.

A reproducer test case for xfstests follows.

  seq=`basename $0`
  seqres=$RESULT_DIR/$seq
  echo "QA output created by $seq"
  tmp=/tmp/$$
  status=1	# failure is the default!
  trap "_cleanup; exit \$status" 0 1 2 3 15

  _cleanup()
  {
      cd /
      rm -f $tmp.*
  }

  # get standard environment, filters and checks
  . ./common/rc
  . ./common/filter

  # real QA test starts here
  _need_to_be_root
  _supported_fs btrfs
  _supported_os Linux
  _require_scratch

  rm -f $seqres.full

  _scratch_mkfs >>$seqres.full 2>&1

  # Write to the [0, 64Kb[ and [68Kb, 1Mb[ ranges of the device. These ranges are
  # reserved for a boot loader to use (GRUB for example) and btrfs should never
  # use them - neither for allocating metadata/data nor should trim/discard them.
  # The range [64Kb, 68Kb[ is used for the primary superblock of the filesystem.
  $XFS_IO_PROG -c "pwrite -S 0xfd 0 64K" $SCRATCH_DEV | _filter_xfs_io
  $XFS_IO_PROG -c "pwrite -S 0xfd 68K 956K" $SCRATCH_DEV | _filter_xfs_io

  # Now mount the filesystem and perform a fitrim against it.
  _scratch_mount
  _require_batched_discard $SCRATCH_MNT
  $FSTRIM_PROG $SCRATCH_MNT

  # Now unmount the filesystem and verify the content of the ranges was not
  # modified (no trim/discard happened on them).
  _scratch_unmount
  echo "Content of the ranges [0, 64Kb] and [68Kb, 1Mb[ after fitrim:"
  od -t x1 -N $((64 * 1024)) $SCRATCH_DEV
  od -t x1 -j $((68 * 1024)) -N $((956 * 1024)) $SCRATCH_DEV

  status=0
  exit
Reported-by: NVincent Petry  <PVince81@yahoo.fr>
Reported-by: NAndrei Borzenkov <arvidjaar@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=109341
Fixes: 499f377f (btrfs: iterate over unused chunk space in FITRIM)
Cc: stable@vger.kernel.org # 4.3+
Signed-off-by: NFilipe Manana <fdmanana@suse.com>

8cdc7c5b

07 1月, 2016 31 次提交

Btrfs: Check metadata redundancy on balance · ee592d07

由 Sam Tygier 提交于 1月 06, 2016

When converting a filesystem via balance check that metadata mode
is at least as redundant as the data mode. For example give warning
when:
-dconvert=raid1 -mconvert=single
Signed-off-by: NSam Tygier <samtygier@yahoo.co.uk>
[ minor message reformatting ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

ee592d07

btrfs: statfs: report zero available if metadata are exhausted · ca8a51b3

由 David Sterba 提交于 10月 10, 2015

There is one ENOSPC case that's very confusing. There's Available
greater than zero but no file operation succeds (besides removing
files). This happens when the metadata are exhausted and there's no
possibility to allocate another chunk.

In this scenario it's normal that there's still some space in the data
chunk and the calculation in df reflects that in the Avail value.

To at least give some clue about the ENOSPC situation, let statfs report
zero value in Avail, even if there's still data space available.

Current:
  /dev/sdb1             4.0G  3.3G  719M  83% /mnt/test

New:
  /dev/sdb1             4.0G  3.3G     0 100% /mnt/test

We calculate the remaining metadata space minus global reserve. If this
is (supposedly) smaller than zero, there's no space. But this does not
hold in practice, the exhausted state happens where's still some
positive delta. So we apply some guesswork and compare the delta to a 4M
threshold. (Practically observed delta was 2M.)

We probably cannot calculate the exact threshold value because this
depends on the internal reservations requested by various operations, so
some operations that consume a few metadata will succeed even if the
Avail is zero. But this is better than the other way around.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

ca8a51b3

btrfs: preallocate path for snapshot creation at ioctl time · 8546b570

由 David Sterba 提交于 11月 10, 2015

We can also preallocate btrfs_path that's used during pending snapshot
creation and avoid another late ENOMEM failure.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

8546b570

btrfs: allocate root item at snapshot ioctl time · b0c0ea63

由 David Sterba 提交于 11月 10, 2015

The actual snapshot creation is delayed until transaction commit. If we
cannot get enough memory for the root item there, we have to fail the
whole transaction commit which is bad. So we'll allocate the memory at
the ioctl call and pass it along with the pending_snapshot struct. The
potential ENOMEM will be returned to the caller of snapshot ioctl.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

b0c0ea63

btrfs: do an allocation earlier during snapshot creation · a1ee7362

由 David Sterba 提交于 11月 10, 2015

We can allocate pending_snapshot earlier and do not have to do cleanup
in case of failure.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

a1ee7362

btrfs: use smaller type for btrfs_path locks · 4fb72bf2

由 David Sterba 提交于 11月 27, 2015

The values of btrfs_path::locks are 0 to 4, fit into a u8. Let's see:

* overall size of btrfs_path drops down from 136 to 112 (-24 bytes),
* better packing in a slab page +6 objects
* the whole structure now fits to 2 cachelines
* slight decrease in code size:

   text    data     bss     dec     hex filename
 938731   43670   23144 1005545   f57e9 fs/btrfs/btrfs.ko.before
 938203   43670   23144 1005017   f55d9 fs/btrfs/btrfs.ko.after

(and the generated assembly does not change much)

The main purpose is to decrease the size of the structure without
affecting performance. The byte access is usually well behaving accross
arches, the locks are not accessed frequently and sometimes just
compared to zero.

Note for further size reduction attempts: the slots could be made u16
but this might generate worse code on some arches (non-byte and non-int
access). Also the range of operations on slots is wider compared to
locks and the potential performance drop should be evaluated first.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

4fb72bf2

btrfs: use smaller type for btrfs_path lowest_level · 7853f15b

由 David Sterba 提交于 11月 27, 2015

The level is 0..7, we can use smaller type. The size of btrfs_path is now
136 bytes from 144, which is +2 objects that fit into a 4k slab.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

7853f15b

btrfs: use smaller type for btrfs_path reada · dccabfad

由 David Sterba 提交于 11月 27, 2015

The possible values for reada are all positive and bounded, we can later
save some bytes by storing it in u8.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

dccabfad

btrfs: cleanup, use enum values for btrfs_path reada · e4058b54

由 David Sterba 提交于 11月 27, 2015

Replace the integers by enums for better readability. The value 2 does
not have any meaning since a7175319
"Btrfs: do less aggressive btree readahead" (2009-01-22).
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e4058b54

btrfs: constify static arrays · 4d4ab6d6

由 David Sterba 提交于 11月 19, 2015

There are a few statically initialized arrays that can be made const.
The remaining (like file_system_type, sysfs attributes or prop handlers)
do not allow that due to type mismatch when passed to the APIs or
because the structures are modified through other members.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

4d4ab6d6

D
btrfs: constify remaining structs with function pointers · 20e5506b
由 David Sterba 提交于 11月 19, 2015
```
* struct extent_io_ops
* struct btrfs_free_space_op
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
20e5506b

btrfs tests: replace whole ops structure for free space tests · 28f0779a

由 David Sterba 提交于 11月 19, 2015

Preparatory work for making btrfs_free_space_op constant. In
test_steal_space_from_bitmap_to_extent, we substitute use_bitmap with
own version thus preventing constification. We can rework it so we
replace the whole structure with the correct function pointers.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

28f0779a

btrfs: use list_for_each_entry* in backref.c · a7ca4225

由 Geliang Tang 提交于 12月 21, 2015

Use list_for_each_entry*() to simplify the code.
Signed-off-by: NGeliang Tang <geliangtang@163.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

a7ca4225

btrfs: use list_for_each_entry_safe in free-space-cache.c · 7ae1681e

由 Geliang Tang 提交于 12月 18, 2015

Use list_for_each_entry_safe() instead of list_for_each_safe() to
simplify the code.
Signed-off-by: NGeliang Tang <geliangtang@163.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

7ae1681e

btrfs: use list_for_each_entry* in check-integrity.c · b69f2bef

由 Geliang Tang 提交于 12月 18, 2015

Use list_for_each_entry*() instead of list_for_each*() to simplify
the code.
Signed-off-by: NGeliang Tang <geliangtang@163.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

b69f2bef

Btrfs: use linux/sizes.h to represent constants · ee22184b

由 Byongho Lee 提交于 12月 15, 2015

We use many constants to represent size and offset value.  And to make
code readable we use '256 * 1024 * 1024' instead of '268435456' to
represent '256MB'.  However we can make far more readable with 'SZ_256MB'
which is defined in the 'linux/sizes.h'.

So this patch replaces 'xxx * 1024 * 1024' kind of expression with
single 'SZ_xxxMB' if 'xxx' is a power of 2 then 'xxx * SZ_1M' if 'xxx' is
not a power of 2. And I haven't touched to '4096' & '8192' because it's
more intuitive than 'SZ_4KB' & 'SZ_8KB'.
Signed-off-by: NByongho Lee <bhlee.kernel@gmail.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

ee22184b

D
btrfs: cleanup, remove stray return statements · 7928d672
由 David Sterba 提交于 11月 30, 2015
```
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
7928d672

btrfs: zero out delayed node upon allocation · 352dd9c8

由 Alexandru Moise 提交于 10月 25, 2015

It's slightly cleaner to zero-out the delayed node upon allocation
than to do it by hand in btrfs_init_delayed_node() for a few members
Signed-off-by: NAlexandru Moise <00moses.alexander00@gmail.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

352dd9c8

btrfs: pass proper enum type to start_transaction() · 575a75d6

由 Alexandru Moise 提交于 10月 25, 2015

Signed-off-by: NAlexandru Moise <00moses.alexander00@gmail.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

575a75d6

btrfs: switch __btrfs_fs_incompat return type from int to bool · 9780c497

由 Alexandru Moise 提交于 10月 18, 2015

Conform to __btrfs_fs_incompat() cast-to-bool (!!) by explicitly
returning boolean not int.
Signed-off-by: NAlexandru Moise <00moses.alexander00@gmail.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

9780c497

btrfs: remove unused inode argument from uncompress_inline() · e40da0e5

由 Byongho Lee 提交于 5月 19, 2015

The inode argument is never used from the beginning, so remove it.
Signed-off-by: NByongho Lee <bhlee.kernel@gmail.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e40da0e5

btrfs: don't use slab cache for struct btrfs_delalloc_work · 100d5702

由 David Sterba 提交于 12月 08, 2015

Although we prefer to use separate caches for various structs, it seems
better not to do that for struct btrfs_delalloc_work. Objects of this
type are allocated rarely, when transaction commit calls
btrfs_start_delalloc_roots, requesting delayed iputs.

The objects are temporary (with some IO involved) but still allocated
and freed within __start_delalloc_inodes. Memory allocation failure is
handled.

The slab cache is empty most of the time (observed on several systems),
so if we need to allocate a new slab object, the first one has to
allocate a full page. In a potential case of low memory conditions this
might fail with higher probability compared to using the generic slab
caches.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

100d5702

btrfs: drop duplicate prefix from scrub workqueues · 0de270fa

由 David Sterba 提交于 12月 01, 2015

The helper btrfs_alloc_workqueue will add the "btrfs-" prefix.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

0de270fa

D
btrfs: verbose error when we find an unexpected item in sys_array · 93a3d467
由 David Sterba 提交于 11月 30, 2015
```
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
93a3d467

btrfs: handle invalid num_stripes in sys_array · f5cdedd7

由 David Sterba 提交于 11月 30, 2015

We can handle the special case of num_stripes == 0 directly inside
btrfs_read_sys_array. The BUG_ON in btrfs_chunk_item_size is there to
catch other unhandled cases where we fail to validate external data.

A crafted or corrupted image crashes at mount time:

BTRFS: device fsid 9006933e-2a9a-44f0-917f-514252aeec2c devid 1 transid 7 /dev/loop0
BTRFS info (device loop0): disk space caching is enabled
BUG: failure at fs/btrfs/ctree.h:337/btrfs_chunk_item_size()!
Kernel panic - not syncing: BUG!
CPU: 0 PID: 313 Comm: mount Not tainted 4.2.5-00657-ge047887-dirty #25
Stack:
 637af890 60062489 602aeb2e 604192ba
 60387961 00000011 637af8a0 6038a835
 637af9c0 6038776b 634ef32b 00000000
Call Trace:
 [<6001c86d>] show_stack+0xfe/0x15b
 [<6038a835>] dump_stack+0x2a/0x2c
 [<6038776b>] panic+0x13e/0x2b3
 [<6020f099>] btrfs_read_sys_array+0x25d/0x2ff
 [<601cfbbe>] open_ctree+0x192d/0x27af
 [<6019c2c1>] btrfs_mount+0x8f5/0xb9a
 [<600bc9a7>] mount_fs+0x11/0xf3
 [<600d5167>] vfs_kern_mount+0x75/0x11a
 [<6019bcb0>] btrfs_mount+0x2e4/0xb9a
 [<600bc9a7>] mount_fs+0x11/0xf3
 [<600d5167>] vfs_kern_mount+0x75/0x11a
 [<600d710b>] do_mount+0xa35/0xbc9
 [<600d7557>] SyS_mount+0x95/0xc8
 [<6001e884>] handle_syscall+0x6b/0x8e
Reported-by: NJiri Slaby <jslaby@suse.com>
Reported-by: NVegard Nossum <vegard.nossum@oracle.com>
CC: stable@vger.kernel.org	# 3.19+
Signed-off-by: NDavid Sterba <dsterba@suse.com>

f5cdedd7

btrfs: better packing of btrfs_delayed_extent_op · 35b3ad50

由 David Sterba 提交于 11月 30, 2015

btrfs_delayed_extent_op can be packed in a better way, it's 40 bytes now
and has 8 unused bytes. Reducing the level type to u8 makes it possible
to squeeze it to the padding byte after key. The bitfields were switched
to bool as there's space to store the full byte without increasing the
whole structure, besides that the generated assembly is smaller.

struct btrfs_delayed_extent_op {
	struct btrfs_disk_key      key;                  /*     0    17 */
	u8                         level;                /*    17     1 */
	bool                       update_key;           /*    18     1 */
	bool                       update_flags;         /*    19     1 */
	bool                       is_data;              /*    20     1 */

	/* XXX 3 bytes hole, try to pack */

	u64                        flags_to_set;         /*    24     8 */

	/* size: 32, cachelines: 1, members: 6 */
	/* sum members: 29, holes: 1, sum holes: 3 */
	/* last cacheline: 32 bytes */
};

The final size is 32 bytes which gives +26 object per slab page.

   text	   data	    bss	    dec	    hex	filename
 938811	  43670	  23144	1005625	  f5839	fs/btrfs/btrfs.ko.before
 938747	  43670	  23144	1005561	  f57f9	fs/btrfs/btrfs.ko.after
Signed-off-by: NDavid Sterba <dsterba@suse.com>

35b3ad50

btrfs: put delayed item hook into inode · 8089fe62

由 David Sterba 提交于 11月 19, 2015

Inodes for delayed iput allocate a trivial helper structure, let's place
the list hook directly into the inode and save a kmalloc (killing a
__GFP_NOFAIL as a bonus) at the cost of increasing size of btrfs_inode.

The inode can be put into the delayed_iputs list more than once and we
have to keep the count. This means we can't use the list_splice to
process a bunch of inodes because we'd lost track of the count if the
inode is put into the delayed iputs again while it's processed.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

8089fe62

btrfs: Support convert to -d dup for btrfs-convert · c5ca8781

由 Zhao Lei 提交于 11月 19, 2015

Since we will add support for -d dup for non-mixed filesystem,
kernel need to support converting to this raid-type.

This patch remove limitation of above case.

Tested by following script:
(combination of dup conversion with fsck):

export TEST_DEV='/dev/vdc'
export TEST_DIR='/var/ltf/tester/mnt'

do_dup_test()
{
    local m_from="$1"
    local d_from="$2"
    local m_to="$3"
    local d_to="$4"

    echo "Convert from -m $m_from -d $d_from to -m $m_to -d $d_to"

    umount "$TEST_DIR" &>/dev/null
    ./mkfs.btrfs -f -m "$m_from" -d "$d_from" "$TEST_DEV" >/dev/null || return 1
    mount "$TEST_DEV" "$TEST_DIR" || return 1

    cp -a /sbin/* "$TEST_DIR"

    [[ "$m_from" != "$m_to" ]] && {
        ./btrfs balance start -f -mconvert="$m_to" "$TEST_DIR" || return 1
    }

    [[ "$d_from" != "$d_to" ]] && {
	local opt=()
	[[ "$d_to" == single ]] && opt+=("-f")
        ./btrfs balance start "${opt[@]}" -dconvert="$d_to" "$TEST_DIR" || return 1
    }

    umount "$TEST_DIR" || return 1
    ./btrfsck "$TEST_DEV" || return 1
    echo

    return 0
}

test_all()
{
    for m_from in single dup; do
    for d_from in single dup; do
    for m_to in single dup; do
    for d_to in single dup; do
    do_dup_test "$m_from" "$d_from" "$m_to" "$d_to" || return 1
    done
    done
    done
    done
}

test_all
Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

c5ca8781

Btrfs: igrab inode in writepage · be7bd730

由 Josef Bacik 提交于 10月 22, 2015

We hit this panic on a few of our boxes this week where we have an
ordered_extent with an NULL inode. We do an igrab() of the inode in writepages,
but weren't doing it in writepage which can be called directly from the VM on
dirty pages. If the inode has been unlinked then we could have I_FREEING set
which means igrab() would return NULL and we get this panic. Fix this by trying
to igrab in btrfs_writepage, and if it returns NULL then just redirty the page
and return AOP_WRITEPAGE_ACTIVATE; so the VM knows it wasn't successful. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

be7bd730

Btrfs: add missing brelse when superblock checksum fails · b2acdddf

由 Anand Jain 提交于 10月 07, 2015

Looks like oversight, call brelse() when checksum fails. Further down the
code, in the non error path, we do call brelse() and so we don't see
brelse() in the goto error paths.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

b2acdddf

Btrfs: fix transaction handle leak on failure to create hard link · 271dba45

由 Filipe Manana 提交于 1月 05, 2016

If we failed to create a hard link we were not always releasing the
the transaction handle we got before, resulting in a memory leak and
preventing any other tasks from being able to commit the current
transaction.
Fix this by always releasing our transaction handle.
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>

271dba45

01 1月, 2016 3 次提交

Btrfs: fix number of transaction units required to create symlink · 9269d12b

由 Filipe Manana 提交于 12月 31, 2015

We weren't accounting for the insertion of an inline extent item for the
symlink inode nor that we need to update the parent inode item (through
the call to btrfs_add_nondir()). So fix this by including two more
transaction units.
Signed-off-by: NFilipe Manana <fdmanana@suse.com>

9269d12b

Btrfs: don't leave dangling dentry if symlink creation failed · d50866d0

由 Filipe Manana 提交于 12月 31, 2015

When we are creating a symlink we might fail with an error after we
created its inode and added the corresponding directory indexes to its
parent inode. In this case we end up never removing the directory indexes
because the inode eviction handler, called for our symlink inode on the
final iput(), only removes items associated with the symlink inode and
not with the parent inode.

Example:

  $ mkfs.btrfs -f /dev/sdi
  $ mount /dev/sdi /mnt
  $ touch /mnt/foo
  $ ln -s /mnt/foo /mnt/bar
  ln: failed to create symbolic link ‘bar’: Cannot allocate memory
  $ umount /mnt
  $ btrfsck /dev/sdi
  Checking filesystem on /dev/sdi
  UUID: d5acb5ba-31bd-42da-b456-89dca2e716e1
  checking extents
  checking free space cache
  checking fs roots
  root 5 inode 258 errors 2001, no inode item, link count wrong
	unresolved ref dir 256 index 3 namelen 3 name bar filetype 7 errors 4, no inode ref
  found 131073 bytes used err is 1
  total csum bytes: 0
  total tree bytes: 131072
  total fs tree bytes: 32768
  total extent tree bytes: 16384
  btree space waste bytes: 124305
  file data blocks allocated: 262144
   referenced 262144
  btrfs-progs v4.2.3

So fix this by adding the directory index entries as the very last
step of symlink creation.
Signed-off-by: NFilipe Manana <fdmanana@suse.com>

d50866d0

Btrfs: send, don't BUG_ON() when an empty symlink is found · a879719b

由 Filipe Manana 提交于 12月 31, 2015

When a symlink is successfully created it always has an inline extent
containing the source path. However if an error happens when creating
the symlink, we can leave in the subvolume's tree a symlink inode without
any such inline extent item - this happens if after btrfs_symlink() calls
btrfs_end_transaction() and before it calls the inode eviction handler
(through the final iput() call), the transaction gets committed and a
crash happens before the eviction handler gets called, or if a snapshot
of the subvolume is made before the eviction handler gets called. Sadly
we can't just avoid this by making btrfs_symlink() call
btrfs_end_transaction() after it calls the eviction handler, because the
later can commit the current transaction before it removes any items from
the subvolume tree (if it encounters ENOSPC errors while reserving space
for removing all the items).

So make send fail more gracefully, with an -EIO error, and print a
message to dmesg/syslog informing that there's an empty symlink inode,
so that the user can delete the empty symlink or do something else
about it.
Reported-by: NStephen R. van den Berg <srb@cuci.nl>
Signed-off-by: NFilipe Manana <fdmanana@suse.com>

a879719b

31 12月, 2015 1 次提交

Btrfs: fix race between free space endio workers and space cache writeout · 2bc0bb5f

由 Filipe Manana 提交于 12月 30, 2015

While running a stress test I ran into the following trace/transaction
abort:

[471626.672243] ------------[ cut here ]------------
[471626.673322] WARNING: CPU: 9 PID: 19107 at fs/btrfs/extent-tree.c:3740 btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs]()
[471626.675492] BTRFS: Transaction aborted (error -2)
[471626.676748] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc i2c_piix
[471626.688802] CPU: 14 PID: 19107 Comm: fsstress Tainted: G        W       4.3.0-rc5-btrfs-next-17+ #1
[471626.690148] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
[471626.691901]  0000000000000000 ffff880016037cf0 ffffffff812566f4 ffff880016037d38
[471626.695009]  ffff880016037d28 ffffffff8104d0a6 ffffffffa040c84e 00000000fffffffe
[471626.697490]  ffff88011fe855f8 ffff88000c484cb0 ffff88000d195000 ffff880016037d90
[471626.699201] Call Trace:
[471626.699804]  [<ffffffff812566f4>] dump_stack+0x4e/0x79
[471626.701049]  [<ffffffff8104d0a6>] warn_slowpath_common+0x9f/0xb8
[471626.702542]  [<ffffffffa040c84e>] ? btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs]
[471626.704326]  [<ffffffff8104d107>] warn_slowpath_fmt+0x48/0x50
[471626.705636]  [<ffffffffa0403717>] ? write_one_cache_group.isra.32+0x77/0x82 [btrfs]
[471626.707048]  [<ffffffffa040c84e>] btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs]
[471626.708616]  [<ffffffffa048a50a>] commit_cowonly_roots+0x1d7/0x25a [btrfs]
[471626.709950]  [<ffffffffa041e34a>] btrfs_commit_transaction+0x4c4/0x991 [btrfs]
[471626.711286]  [<ffffffff81081c61>] ? signal_pending_state+0x31/0x31
[471626.712611]  [<ffffffffa03f6df4>] btrfs_sync_fs+0x145/0x1ad [btrfs]
[471626.715610]  [<ffffffff811962a2>] ? SyS_tee+0x226/0x226
[471626.716718]  [<ffffffff811962c2>] sync_fs_one_sb+0x20/0x22
[471626.717672]  [<ffffffff8116fc01>] iterate_supers+0x75/0xc2
[471626.718800]  [<ffffffff8119669a>] sys_sync+0x52/0x80
[471626.719990]  [<ffffffff8147cd97>] entry_SYSCALL_64_fastpath+0x12/0x6f
[471626.721835] ---[ end trace baf57f43d76693f4 ]---
[471626.722954] BTRFS: error (device sdc) in btrfs_write_dirty_block_groups:3740: errno=-2 No such entry

This is a very rare situation and it happened due to a race between a free
space endio worker and writing the space caches for dirty block groups at
a transaction's commit critical section. The steps leading to this are:

1) A task calls btrfs_commit_transaction() and starts the writeout of the
   space caches for all currently dirty block groups (i.e. it calls
   btrfs_start_dirty_block_groups());

2) The previous step starts writeback for space caches;

3) When the writeback finishes it queues jobs for free space endio work
   queue (fs_info->endio_freespace_worker) that execute
   btrfs_finish_ordered_io();

4) The task committing the transaction sets the transaction's state
   to TRANS_STATE_COMMIT_DOING and shortly after calls
   btrfs_write_dirty_block_groups();

5) A free space endio job joins the transaction, through
   btrfs_join_transaction_nolock(), and updates a free space inode item
   in the root tree through btrfs_update_inode_fallback();

6) Updating the free space inode item resulted in COWing one or more
   nodes/leaves of the root tree, and that resulted in creating a new
   metadata block group, which gets added to the transaction's list
   of dirty block groups (this is a very rare case);

7) The free space endio job has not released yet its transaction handle
   at this point, so the new metadata block group was not yet fully
   created (didn't go through btrfs_create_pending_block_groups() yet);

8) The transaction commit task sees the new metadata block group in
   the transaction's list of dirty block groups and processes it.
   When it attempts to update the block group's block group item in
   the extent tree, through write_one_cache_group(), it isn't able
   to find it and aborts the transaction with error -ENOENT - this
   is because the free space endio job hasn't yet released its
   transaction handle (which calls btrfs_create_pending_block_groups())
   and therefore the block group item was not yet added to the extent
   tree.

Fix this waiting for free space endio jobs if we fail to find a block
group item in the extent tree and then retry once updating the block
group item.
Signed-off-by: NFilipe Manana <fdmanana@suse.com>

2bc0bb5f

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功