提交 · f37938e0e22f90740cfbfe74ecd9aad5e8369b9d · openanolis / cloud-kernel

17 2月, 2015 8 次提交

btrfs: factor btrfs_init_btree_inode() out of open_ctree() · f37938e0

由 Eric Sandeen 提交于 8月 01, 2014

Signed-off-by: NEric Sandeen <sandeen@redhat.com>
[renamed to btrfs_init_btree_inode]
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

f37938e0

btrfs: factor btrfs_init_balance() out of open_ctree() · 779a65a4

由 Eric Sandeen 提交于 8月 01, 2014

Signed-off-by: NEric Sandeen <sandeen@redhat.com>
[renamed to btrfs_init_balance]
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

779a65a4

btrfs: factor btrfs_init_scrub() out of open_ctree() · 638aa7ed

由 Eric Sandeen 提交于 8月 01, 2014

Signed-off-by: NEric Sandeen <sandeen@redhat.com>
[renamed to btrfs_init_scrub]
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

638aa7ed

btrfs: consistently use fs_info in close_ctree() · 04892340

由 Eric Sandeen 提交于 8月 01, 2014

close_ctree() has a local fs_info var for convienience;
use it consistently.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

04892340

btrfs: remove unused fs_info arg from btrfs_close_extra_devices() · 9eaed21e

由 Eric Sandeen 提交于 8月 01, 2014

The commit:
8dabb742 Btrfs: change core code of btrfs to support the
        device replace operations
added the fs_info argument, but never used it -
just remove it again.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

9eaed21e

btrfs: fix sizeof format specifier in btrfs_check_super_valid() · 41d6b13e

由 Fabian Frederick 提交于 2月 14, 2015

This patch fixes mips compilation warning:

fs/btrfs/disk-io.c: In function 'btrfs_check_super_valid':
fs/btrfs/disk-io.c:3927:21: warning: format '%lu' expects argument
of type 'long unsigned int', but argument 3 has type 'unsigned int' [-Wformat]
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

41d6b13e

btrfs: constify structs with op functions or static definitions · e8c9f186

由 David Sterba 提交于 1月 02, 2015

There are some op tables that can be easily made const, similarly the
sysfs feature and raid tables. This is motivated by PaX CONSTIFY plugin.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

e8c9f186

Btrfs: disk-io: replace root args iff only fs_info used · 01d58472

由 Daniel Dressler 提交于 11月 21, 2014

This is the 3rd independent patch of a larger project to cleanup btrfs's
internal usage of btrfs_root. Many functions take btrfs_root only to
grab the fs_info struct.

By requiring a root these functions cause programmer overhead. That
these functions can accept any valid root is not obvious until
inspection.

This patch reduces the specificity of such functions to accept the
fs_info directly.

These patches can be applied independently and thus are not being
submitted as a patch series. There should be about 26 patches by the
project's completion. Each patch will cleanup between 1 and 34 functions
apiece.  Each patch covers a single file's functions.

This patch affects the following function(s):
  1) csum_tree_block
  2) csum_dirty_buffer
  3) check_tree_block_fsid
  4) btrfs_find_tree_block
  5) clean_tree_block
Signed-off-by: NDaniel Dressler <danieru.dressler@gmail.com>
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

01d58472

03 2月, 2015 3 次提交

Btrfs: fix race between transaction commit and empty block group removal · d4b450cd

由 Filipe Manana 提交于 1月 29, 2015

Committing a transaction can race with automatic removal of empty block
groups (cleaner kthread), leading to a BUG_ON() in the transaction
commit code while running btrfs_finish_extent_commit(). The following
sequence diagram shows how it can happen:

           CPU 1                                       CPU 2

btrfs_commit_transaction()
  fs_info->running_transaction = NULL
  btrfs_finish_extent_commit()
    find_first_extent_bit()
      -> found range for block group X
         in fs_info->freed_extents[]

                                               btrfs_delete_unused_bgs()
                                                 -> found block group X

                                                 Removed block group X's range
                                                 from fs_info->freed_extents[]

                                                 btrfs_remove_chunk()
                                                    btrfs_remove_block_group(bg X)

    unpin_extent_range(bg X range)
       btrfs_lookup_block_group(bg X)
          -> returns NULL
            -> BUG_ON()

The trace that results from the BUG_ON() is:

[48665.187808] ------------[ cut here ]------------
[48665.188032] kernel BUG at fs/btrfs/extent-tree.c:5675!
[48665.188032] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[48665.188032] Modules linked in: dm_flakey dm_mod crc32c_generic btrfs xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop parport_pc evdev microcode
[48665.197388] CPU: 2 PID: 31211 Comm: kworker/u32:16 Tainted: G        W      3.19.0-rc5-btrfs-next-4+ #1
[48665.197388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[48665.197388] Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs]
[48665.197388] task: ffff880222011810 ti: ffff8801b56a4000 task.ti: ffff8801b56a4000
[48665.197388] RIP: 0010:[<ffffffffa0350d05>]  [<ffffffffa0350d05>] unpin_extent_range+0x6a/0x1ba [btrfs]
[48665.197388] RSP: 0018:ffff8801b56a7b88  EFLAGS: 00010246
[48665.197388] RAX: 0000000000000000 RBX: ffff8802143a6000 RCX: ffff8802220120c8
[48665.197388] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff8800a3c140b0
[48665.197388] RBP: ffff8801b56a7bd8 R08: 0000000000000003 R09: 0000000000000000
[48665.197388] R10: 0000000000000000 R11: 000000000000bbac R12: 0000000012e8e000
[48665.197388] R13: ffff8800a3c14000 R14: 0000000000000000 R15: 0000000000000000
[48665.197388] FS:  0000000000000000(0000) GS:ffff88023ec40000(0000) knlGS:0000000000000000
[48665.197388] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[48665.197388] CR2: 00007f065e42f270 CR3: 0000000206f70000 CR4: 00000000000006e0
[48665.197388] Stack:
[48665.197388]  ffff8801b56a7bd8 0000000012ea0000 01ff8800a3c14138 0000000012e9ffff
[48665.197388]  ffff880141df3dd8 ffff8802143a6000 ffff8800a3c14138 ffff880141df3df0
[48665.197388]  ffff880141df3dd8 0000000000000000 ffff8801b56a7c08 ffffffffa0354227
[48665.197388] Call Trace:
[48665.197388]  [<ffffffffa0354227>] btrfs_finish_extent_commit+0xb0/0xd9 [btrfs]
[48665.197388]  [<ffffffffa0366b4b>] btrfs_commit_transaction+0x791/0x92c [btrfs]
[48665.197388]  [<ffffffffa0352432>] flush_space+0x43d/0x452 [btrfs]
[48665.197388]  [<ffffffff814295c3>] ? _raw_spin_unlock+0x28/0x33
[48665.197388]  [<ffffffffa035255f>] btrfs_async_reclaim_metadata_space+0x118/0x164 [btrfs]
[48665.197388]  [<ffffffff81059917>] ? process_one_work+0x14b/0x3ab
[48665.197388]  [<ffffffff810599ac>] process_one_work+0x1e0/0x3ab
[48665.197388]  [<ffffffff81079fa9>] ? trace_hardirqs_off+0xd/0xf
[48665.197388]  [<ffffffff8105a55b>] worker_thread+0x210/0x2d0
[48665.197388]  [<ffffffff8105a34b>] ? rescuer_thread+0x2c3/0x2c3
[48665.197388]  [<ffffffff8105e5c0>] kthread+0xef/0xf7
[48665.197388]  [<ffffffff81429682>] ? _raw_spin_unlock_irq+0x2d/0x39
[48665.197388]  [<ffffffff8105e4d1>] ? __kthread_parkme+0xad/0xad
[48665.197388]  [<ffffffff81429dec>] ret_from_fork+0x7c/0xb0
[48665.197388]  [<ffffffff8105e4d1>] ? __kthread_parkme+0xad/0xad
[48665.197388] Code: 85 f6 74 14 49 8b 06 49 03 46 09 49 39 c4 72 1d 4c 89 f7 e8 83 ec ff ff 4c 89 e6 4c 89 ef e8 1e f1 ff ff 48 85 c0 49 89 c6 75 02 <0f> 0b 49 8b 1e 49 03 5e 09 48 8b
[48665.197388] RIP  [<ffffffffa0350d05>] unpin_extent_range+0x6a/0x1ba [btrfs]
[48665.197388]  RSP <ffff8801b56a7b88>
[48665.272246] ---[ end trace b9c6ab9957521376 ]---

Fix this by ensuring that unpining the block group's range in
btrfs_finish_extent_commit() is done in a synchronized fashion
with removing the block group's range from freed_extents[]
in btrfs_delete_unused_bgs()

This race got introduced with the change:

    Btrfs: remove empty block groups automatically
    commit 47ab2a6cSigned-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

d4b450cd

btrfs: add checks for sys_chunk_array sizes · ce7fca5f

由 David Sterba 提交于 10月 31, 2014

Verify that possible minimum and maximum size is set, validity of
contents is checked in btrfs_read_sys_array.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

ce7fca5f

btrfs: more superblock checks, lower bounds on devices and sectorsize/nodesize · 75d6ad38

由 David Sterba 提交于 10月 31, 2014

I received a few crafted images from Jiri, all got through the recently
added superblock checks. The lower bounds checks for num_devices and
sector/node -sizes were missing and caused a crash during mount.

Tools for symbolic code execution were used to prepare the images
contents.
Reported-by: NJiri Slaby <jslaby@suse.cz>
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

75d6ad38

22 1月, 2015 5 次提交

Btrfs: fix unused members in struct btrfs_root · 78f55e5e

由 Anand Jain 提交于 1月 13, 2015

There isn't any real use of following members of struct btrfs_root
so delete them.

struct kobject root_kobj;
struct completion kobj_unregister;
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

78f55e5e

btrfs: set proper message level for skinny metadata · 5efa0490

由 David Sterba 提交于 12月 19, 2014

This has been confusing people for too long, the message is really just
informative.

CC: <stable@vger.kernel.org> # 3.10+
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

5efa0490

btrfs: update message levels after checksum errors · f0954c66

由 David Sterba 提交于 12月 19, 2014

The errors are worth noting and might get missed with INFO level.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

f0954c66

btrfs: update message levels during failed mount · aa8ee312

由 David Sterba 提交于 12月 19, 2014

All error conditions from open_ctree shall be ERR. Warning would
suggest that something's wrong and we can continue.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

aa8ee312

btrfs: update message levels for errors · 68b663d1

由 David Sterba 提交于 12月 19, 2014

Several messages that point to some internal problem, level INFO is
wrong here.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

68b663d1

15 1月, 2015 2 次提交

btrfs: expand btrfs_find_item if found_key is NULL · 1d4c08e0

由 David Sterba 提交于 1月 02, 2015

If the found_key is NULL, then btrfs_find_item becomes a verbose wrapper
for simple btrfs_search_slot.

After we've removed all such callers, passing a NULL key is not valid
anymore.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

1d4c08e0

btrfs: fix leak of path in btrfs_find_item · 381cf658

由 David Sterba 提交于 1月 02, 2015

If btrfs_find_item is called with NULL path it allocates one locally but
does not free it. Affected paths are inserting an orphan item for a file
and for a subvol root.

Move the path allocation to the callers.

CC: <stable@vger.kernel.org> # 3.14+
Fixes: 3f870c28 ("btrfs: expand btrfs_find_item() to include find_orphan_item functionality")
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

381cf658

13 12月, 2014 4 次提交

btrfs: sink parameter len to alloc_extent_buffer · ce3e6984

由 David Sterba 提交于 6月 15, 2014

Because we're using globally known nodesize. Do the same for the sanity
test function variant.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

ce3e6984

btrfs: sink blocksize parameter to btrfs_find_create_tree_block · a83fffb7

由 David Sterba 提交于 6月 15, 2014

Finally it's clear that the requested blocksize is always equal to
nodesize, with one exception, the superblock.

Superblock has fixed size regardless of the metadata block size, but
uses the same helpers to initialize sys array/chunk tree and to work
with the chunk items. So it pretends to be an extent_buffer for a
moment, btrfs_read_sys_array is full of special cases, we're adding one
more.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

a83fffb7

D
btrfs: sink blocksize parameter to reada_tree_block_flagged · c0dcaa4d
由 David Sterba 提交于 6月 15, 2014
```
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
```
c0dcaa4d
D
btrfs: sink blocksize parameter to readahead_tree_block · d3e46fea
由 David Sterba 提交于 6月 15, 2014
```
All callers pass nodesize.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
```
d3e46fea

11 12月, 2014 1 次提交

Btrfs: fix fs corruption on transaction abort if device supports discard · 678886bd

由 Filipe Manana 提交于 12月 07, 2014

When we abort a transaction we iterate over all the ranges marked as dirty
in fs_info->freed_extents[0] and fs_info->freed_extents[1], clear them
from those trees, add them back (unpin) to the free space caches and, if
the fs was mounted with "-o discard", perform a discard on those regions.
Also, after adding the regions to the free space caches, a fitrim ioctl call
can see those ranges in a block group's free space cache and perform a discard
on the ranges, so the same issue can happen without "-o discard" as well.

This causes corruption, affecting one or multiple btree nodes (in the worst
case leaving the fs unmountable) because some of those ranges (the ones in
the fs_info->pinned_extents tree) correspond to btree nodes/leafs that are
referred by the last committed super block - breaking the rule that anything
that was committed by a transaction is untouched until the next transaction
commits successfully.

I ran into this while running in a loop (for several hours) the fstest that
I recently submitted:

  [PATCH] fstests: add btrfs test to stress chunk allocation/removal and fstrim

The corruption always happened when a transaction aborted and then fsck complained
like this:

   _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent
   *** fsck.btrfs output ***
   Check tree block failed, want=94945280, have=0
   Check tree block failed, want=94945280, have=0
   Check tree block failed, want=94945280, have=0
   Check tree block failed, want=94945280, have=0
   Check tree block failed, want=94945280, have=0
   read block failed check_tree_block
   Couldn't open file system

In this case 94945280 corresponded to the root of a tree.
Using frace what I observed was the following sequence of steps happened:

   1) transaction N started, fs_info->pinned_extents pointed to
      fs_info->freed_extents[0];

   2) node/eb 94945280 is created;

   3) eb is persisted to disk;

   4) transaction N commit starts, fs_info->pinned_extents now points to
      fs_info->freed_extents[1], and transaction N completes;

   5) transaction N + 1 starts;

   6) eb is COWed, and btrfs_free_tree_block() called for this eb;

   7) eb range (94945280 to 94945280 + 16Kb) is added to
      fs_info->pinned_extents (fs_info->freed_extents[1]);

   8) Something goes wrong in transaction N + 1, like hitting ENOSPC
      for example, and the transaction is aborted, turning the fs into
      readonly mode. The stack trace I got for example:

      [112065.253935]  [<ffffffff8140c7b6>] dump_stack+0x4d/0x66
      [112065.254271]  [<ffffffff81042984>] warn_slowpath_common+0x7f/0x98
      [112065.254567]  [<ffffffffa0325990>] ? __btrfs_abort_transaction+0x50/0x10b [btrfs]
      [112065.261674]  [<ffffffff810429e5>] warn_slowpath_fmt+0x48/0x50
      [112065.261922]  [<ffffffffa032949e>] ? btrfs_free_path+0x26/0x29 [btrfs]
      [112065.262211]  [<ffffffffa0325990>] __btrfs_abort_transaction+0x50/0x10b [btrfs]
      [112065.262545]  [<ffffffffa036b1d6>] btrfs_remove_chunk+0x537/0x58b [btrfs]
      [112065.262771]  [<ffffffffa033840f>] btrfs_delete_unused_bgs+0x1de/0x21b [btrfs]
      [112065.263105]  [<ffffffffa0343106>] cleaner_kthread+0x100/0x12f [btrfs]
      (...)
      [112065.264493] ---[ end trace dd7903a975a31a08 ]---
      [112065.264673] BTRFS: error (device sdc) in btrfs_remove_chunk:2625: errno=-28 No space left
      [112065.264997] BTRFS info (device sdc): forced readonly

   9) The clear kthread sees that the BTRFS_FS_STATE_ERROR bit is set in
      fs_info->fs_state and calls btrfs_cleanup_transaction(), which in
      turn calls btrfs_destroy_pinned_extent();

   10) Then btrfs_destroy_pinned_extent() iterates over all the ranges
       marked as dirty in fs_info->freed_extents[], and for each one
       it calls discard, if the fs was mounted with "-o discard", and
       adds the range to the free space cache of the respective block
       group;

   11) btrfs_trim_block_group(), invoked from the fitrim ioctl code path,
       sees the free space entries and performs a discard;

   12) After an umount and mount (or fsck), our eb's location on disk was full
       of zeroes, and it should have been untouched, because it was marked as
       dirty in the fs_info->pinned_extents tree, and therefore used by the
       trees that the last committed superblock points to.

Fix this by not performing a discard and not adding the ranges to the free space
caches - it's useless from this point since the fs is now in readonly mode and
we won't write free space caches to disk anymore (otherwise we would leak space)
nor any new superblock. By not adding the ranges to the free space caches, it
prevents other code paths from allocating that space and write to it as well,
therefore being safer and simpler.

This isn't a new problem, as it's been present since 2011 (git commit
acce952b).

Cc: stable@vger.kernel.org  # any kernel released after 2011-01-06
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

678886bd

03 12月, 2014 1 次提交

Btrfs: fix race between fs trimming and block group remove/allocation · 04216820

由 Filipe Manana 提交于 11月 27, 2014

Our fs trim operation, which is completely transactionless (doesn't start
or joins an existing transaction) consists of visiting all block groups
and then for each one to iterate its free space entries and perform a
discard operation against the space range represented by the free space
entries. However before performing a discard, the corresponding free space
entry is removed from the free space rbtree, and when the discard completes
it is added back to the free space rbtree.

If a block group remove operation happens while the discard is ongoing (or
before it starts and after a free space entry is hidden), we end up not
waiting for the discard to complete, remove the extent map that maps
logical address to physical addresses and the corresponding chunk metadata
from the the chunk and device trees. After that and before the discard
completes, the current running transaction can finish and a new one start,
allowing for new block groups that map to the same physical addresses to
be allocated and written to.

So fix this by keeping the extent map in memory until the discard completes
so that the same physical addresses aren't reused before it completes.

If the physical locations that are under a discard operation end up being
used for a new metadata block group for example, and dirty metadata extents
are written before the discard finishes (the VM might call writepages() of
our btree inode's i_mapping for example, or an fsync log commit happens) we
end up overwriting metadata with zeroes, which leads to errors from fsck
like the following:

        checking extents
        Check tree block failed, want=833912832, have=0
        Check tree block failed, want=833912832, have=0
        Check tree block failed, want=833912832, have=0
        Check tree block failed, want=833912832, have=0
        Check tree block failed, want=833912832, have=0
        read block failed check_tree_block
        owner ref check failed [833912832 16384]
        Errors found in extent allocation tree or chunk allocation
        checking free space cache
        checking fs roots
        Check tree block failed, want=833912832, have=0
        Check tree block failed, want=833912832, have=0
        Check tree block failed, want=833912832, have=0
        Check tree block failed, want=833912832, have=0
        Check tree block failed, want=833912832, have=0
        read block failed check_tree_block
        root 5 root dir 256 error
        root 5 inode 260 errors 2001, no inode item, link count wrong
                unresolved ref dir 256 index 0 namelen 8 name foobar_3 filetype 1 errors 6, no dir index, no inode ref
        root 5 inode 262 errors 2001, no inode item, link count wrong
                unresolved ref dir 256 index 0 namelen 8 name foobar_5 filetype 1 errors 6, no dir index, no inode ref
        root 5 inode 263 errors 2001, no inode item, link count wrong
        (...)
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

04216820

22 11月, 2014 1 次提交

Btrfs: make sure logged extents complete in the current transaction V3 · 50d9aa99

由 Josef Bacik 提交于 11月 21, 2014

Liu Bo pointed out that my previous fix would lose the generation update in the
scenario I described.  It is actually much worse than that, we could lose the
entire extent if we lose power right after the transaction commits.  Consider
the following

write extent 0-4k
log extent in log tree
commit transaction
	< power fail happens here
ordered extent completes

We would lose the 0-4k extent because it hasn't updated the actual fs tree, and
the transaction commit will reset the log so it isn't replayed.  If we lose
power before the transaction commit we are save, otherwise we are not.

Fix this by keeping track of all extents we logged in this transaction.  Then
when we go to commit the transaction make sure we wait for all of those ordered
extents to complete before proceeding.  This will make sure that if we lose
power after the transaction commit we still have our data.  This also fixes the
problem of the improperly updated extent generation.  Thanks,

cc: stable@vger.kernel.org
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

50d9aa99

21 11月, 2014 1 次提交

btrfs: fix typos in btrfs_check_super_valid · cd743fac

由 David Sterba 提交于 10月 31, 2014

Copy&paste errors in some messages and add few more missing macro
accessors.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

cd743fac

12 11月, 2014 2 次提交

btrfs: switch inode_cache option handling to pending changes · 7e1876ac

由 David Sterba 提交于 2月 05, 2014

The pending mount option(s) now share namespace and bits with the normal
options, and the existing one for (inode_cache) is unset unconditionally
at each transaction commit.

Introduce a separate namespace for pending changes and enhance the
descriptions of the intended change to use separate bits for each
action.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

7e1876ac

btrfs: add support for processing pending changes · 572d9ab7

由 David Sterba 提交于 2月 05, 2014

There are some actions that modify global filesystem state but cannot be
performed at the time of request, but later at the transaction commit
time when the filesystem is in a known state.

For example enabling new incompat features on-the-fly or issuing
transaction commit from unsafe contexts (sysfs handlers).
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

572d9ab7

28 10月, 2014 1 次提交

btrfs: use macro accessors in superblock validation checks · 21e7626b

由 David Sterba 提交于 10月 27, 2014

The initial patch c926093e (btrfs: add more superblock checks)
did not properly use the macro accessors that wrap endianness and the
code would not work correctly on big endian machines.
Reported-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

21e7626b

04 10月, 2014 2 次提交

btrfs: add more superblock checks · c926093e

由 David Sterba 提交于 9月 30, 2014

Populate btrfs_check_super_valid() with checks that try to verify
consistency of superblock by additional conditions that may arise from
corrupted devices or bitflips. Some of tests are only hints and issue
warnings instead of failing the mount, basically when the checks are
derived from the data found in the superblock.

Tested on a broken image provided by Qu.
Reported-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

c926093e

Btrfs: be aware of btree inode write errors to avoid fs corruption · 656f30db

由 Filipe Manana 提交于 9月 26, 2014

While we have a transaction ongoing, the VM might decide at any time
to call btree_inode->i_mapping->a_ops->writepages(), which will start
writeback of dirty pages belonging to btree nodes/leafs. This call
might return an error or the writeback might finish with an error
before we attempt to commit the running transaction. If this happens,
we might have no way of knowing that such error happened when we are
committing the transaction - because the pages might no longer be
marked dirty nor tagged for writeback (if a subsequent modification
to the extent buffer didn't happen before the transaction commit) which
makes filemap_fdata[write|wait]_range unable to find such pages (even
if they're marked with SetPageError).
So if this happens we must abort the transaction, otherwise we commit
a super block with btree roots that point to btree nodes/leafs whose
content on disk is invalid - either garbage or the content of some
node/leaf from a past generation that got cowed or deleted and is no
longer valid (for this later case we end up getting error messages like
"parent transid verify failed on 10826481664 wanted 25748 found 29562"
when reading btree nodes/leafs from disk).

Note that setting and checking AS_EIO/AS_ENOSPC in the btree inode's
i_mapping would not be enough because we need to distinguish between
log tree extents (not fatal) vs non-log tree extents (fatal) and
because the next call to filemap_fdatawait_range() will catch and clear
such errors in the mapping - and that call might be from a log sync and
not from a transaction commit, which means we would not know about the
error at transaction commit time. Also, checking for the eb flag
EXTENT_BUFFER_IOERR at transaction commit time isn't done and would
not be completely reliable, as the eb might be removed from memory and
read back when trying to get it, which clears that flag right before
reading the eb's pages from disk, making us not know about the previous
write error.

Using the new 3 flags for the btree inode also makes us achieve the
goal of AS_EIO/AS_ENOSPC when writepages() returns success, started
writeback for all dirty pages and before filemap_fdatawait_range() is
called, the writeback for all dirty pages had already finished with
errors - because we were not using AS_EIO/AS_ENOSPC,
filemap_fdatawait_range() would return success, as it could not know
that writeback errors happened (the pages were no longer tagged for
writeback).
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

656f30db

02 10月, 2014 9 次提交

D
btrfs: move checks for DUMMY_ROOT into a helper · fccb84c9
由 David Sterba 提交于 9月 29, 2014
```
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
```
fccb84c9
D
btrfs: hide typecast to definition of BTRFS_SEND_TRANS_STUB · 2755a0de
由 David Sterba 提交于 7月 31, 2014
```
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
```
2755a0de

btrfs: use slab for end_io_wq structures · 97eb6b69

由 David Sterba 提交于 7月 30, 2014

The structure is frequently reused.  Rename it according to the slab
name.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

97eb6b69

D
btrfs: use enum for wq endio metadata type · bfebd8b5
由 David Sterba 提交于 7月 30, 2014
```
The enum exists but is not consistently used.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
```
bfebd8b5

Btrfs: set default max_inline to 8KiB instead of 8MiB · 95ac567a

由 Filipe David Borba Manana 提交于 8月 08, 2013

8MiB is way too large and likely set by mistake. This is not
a significant issue as in practice the max amount of data
added to an inline extent is also limited by the page cache
and btree leaf sizes.
Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

95ac567a

btrfs: remove blocksize from btrfs_alloc_free_block and rename · 4d75f8a9

由 David Sterba 提交于 6月 15, 2014

Rename to btrfs_alloc_tree_block as it fits to the alloc/find/free +
_tree_block family. The parameter blocksize was set to the metadata
block size, directly or indirectly.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

4d75f8a9

D
btrfs: remove unused parameter blocksize from btrfs_find_tree_block · 0308af44
由 David Sterba 提交于 6月 15, 2014
```
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
```
0308af44
D
btrfs: remove parameter blocksize from read_tree_block · ce86cd59
由 David Sterba 提交于 6月 15, 2014
```
We know the tree block size, no need to pass it around.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
```
ce86cd59

btrfs: return void from readahead_tree_block · 6197d86e

由 David Sterba 提交于 6月 15, 2014

Errors in readahead are not fatal and ignored elsewhere in the code.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

6197d86e

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功