1. 26 3月, 2018 5 次提交
  2. 22 1月, 2018 12 次提交
  3. 13 1月, 2018 2 次提交
    • M
      error-injection: Add injectable error types · 663faf9f
      Masami Hiramatsu 提交于
      Add injectable error types for each error-injectable function.
      
      One motivation of error injection test is to find software flaws,
      mistakes or mis-handlings of expectable errors. If we find such
      flaws by the test, that is a program bug, so we need to fix it.
      
      But if the tester miss input the error (e.g. just return success
      code without processing anything), it causes unexpected behavior
      even if the caller is correctly programmed to handle any errors.
      That is not what we want to test by error injection.
      
      To clarify what type of errors the caller must expect for each
      injectable function, this introduces injectable error types:
      
       - EI_ETYPE_NULL : means the function will return NULL if it
      		    fails. No ERR_PTR, just a NULL.
       - EI_ETYPE_ERRNO : means the function will return -ERRNO
      		    if it fails.
       - EI_ETYPE_ERRNO_NULL : means the function will return -ERRNO
      		       (ERR_PTR) or NULL.
      
      ALLOW_ERROR_INJECTION() macro is expanded to get one of
      NULL, ERRNO, ERRNO_NULL to record the error type for
      each function. e.g.
      
       ALLOW_ERROR_INJECTION(open_ctree, ERRNO)
      
      This error types are shown in debugfs as below.
      
        ====
        / # cat /sys/kernel/debug/error_injection/list
        open_ctree [btrfs]	ERRNO
        io_ctl_init [btrfs]	ERRNO
        ====
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      663faf9f
    • M
      error-injection: Separate error-injection from kprobe · 540adea3
      Masami Hiramatsu 提交于
      Since error-injection framework is not limited to be used
      by kprobes, nor bpf. Other kernel subsystems can use it
      freely for checking safeness of error-injection, e.g.
      livepatch, ftrace etc.
      So this separate error-injection framework from kprobes.
      
      Some differences has been made:
      
      - "kprobe" word is removed from any APIs/structures.
      - BPF_ALLOW_ERROR_INJECTION() is renamed to
        ALLOW_ERROR_INJECTION() since it is not limited for BPF too.
      - CONFIG_FUNCTION_ERROR_INJECTION is the config item of this
        feature. It is automatically enabled if the arch supports
        error injection feature for kprobe or ftrace etc.
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      540adea3
  4. 13 12月, 2017 1 次提交
  5. 07 12月, 2017 1 次提交
  6. 28 11月, 2017 1 次提交
    • Q
      btrfs: tree-checker: Fix false panic for sanity test · 69fc6cbb
      Qu Wenruo 提交于
      [BUG]
      If we run btrfs with CONFIG_BTRFS_FS_RUN_SANITY_TESTS=y, it will
      instantly cause kernel panic like:
      
      ------
      ...
      assertion failed: 0, file: fs/btrfs/disk-io.c, line: 3853
      ...
      Call Trace:
       btrfs_mark_buffer_dirty+0x187/0x1f0 [btrfs]
       setup_items_for_insert+0x385/0x650 [btrfs]
       __btrfs_drop_extents+0x129a/0x1870 [btrfs]
      ...
      -----
      
      [Cause]
      Btrfs will call btrfs_check_leaf() in btrfs_mark_buffer_dirty() to check
      if the leaf is valid with CONFIG_BTRFS_FS_RUN_SANITY_TESTS=y.
      
      However quite some btrfs_mark_buffer_dirty() callers(*) don't really
      initialize its item data but only initialize its item pointers, leaving
      item data uninitialized.
      
      This makes tree-checker catch uninitialized data as error, causing
      such panic.
      
      *: These callers include but not limited to
      setup_items_for_insert()
      btrfs_split_item()
      btrfs_expand_item()
      
      [Fix]
      Add a new parameter @check_item_data to btrfs_check_leaf().
      With @check_item_data set to false, item data check will be skipped and
      fallback to old btrfs_check_leaf() behavior.
      
      So we can still get early warning if we screw up item pointers, and
      avoid false panic.
      
      Cc: Filipe Manana <fdmanana@gmail.com>
      Reported-by: NLakshmipathi.G <lakshmipathi.g@gmail.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      69fc6cbb
  7. 02 11月, 2017 2 次提交
    • J
      btrfs: track refs in a rb_tree instead of a list · 0e0adbcf
      Josef Bacik 提交于
      If we get a significant amount of delayed refs for a single block (think
      modifying multiple snapshots) we can end up spending an ungodly amount
      of time looping through all of the entries trying to see if they can be
      merged.  This is because we only add them to a list, so we have O(2n)
      for every ref head.  This doesn't make any sense as we likely have refs
      for different roots, and so they cannot be merged.  Tracking in a tree
      will allow us to break as soon as we hit an entry that doesn't match,
      making our worst case O(n).
      
      With this we can also merge entries more easily.  Before we had to hope
      that matching refs were on the ends of our list, but with the tree we
      can search down to exact matches and merge them at insert time.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      0e0adbcf
    • J
      btrfs: make the delalloc block rsv per inode · 69fe2d75
      Josef Bacik 提交于
      The way we handle delalloc metadata reservations has gotten
      progressively more complicated over the years.  There is so much cruft
      and weirdness around keeping the reserved count and outstanding counters
      consistent and handling the error cases that it's impossible to
      understand.
      
      Fix this by making the delalloc block rsv per-inode.  This way we can
      calculate the actual size of the outstanding metadata reservations every
      time we make a change, and then reserve the delta based on that amount.
      This greatly simplifies the code everywhere, and makes the error
      handling in btrfs_delalloc_reserve_metadata far less terrifying.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      69fe2d75
  8. 30 10月, 2017 13 次提交
    • N
      btrfs: Replace opencoded sizes with their symbolic constants · d4417e22
      Nikolay Borisov 提交于
      Currently btrfs' code uses a mix of opencoded sizes and defines from sizes.h.
      Let's unifiy the code base to always use the symbolic constants. No functional
      changes
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d4417e22
    • J
      btrfs: remove delayed_ref_node from ref_head · d278850e
      Josef Bacik 提交于
      This is just excessive information in the ref_head, and makes the code
      complicated.  It is a relic from when we had the heads and the refs in
      the same tree, which is no longer the case.  With this removal I've
      cleaned up a bunch of the cruft around this old assumption as well.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d278850e
    • J
      Btrfs: add a extent ref verify tool · fd708b81
      Josef Bacik 提交于
      We were having corruption issues that were tied back to problems with
      the extent tree.  In order to track them down I built this tool to try
      and find the culprit, which was pretty successful.  If you compile with
      this tool on it will live verify every ref update that the fs makes and
      make sure it is consistent and valid.  I've run this through with
      xfstests and haven't gotten any false positives.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ update error messages, add fixup from Dan Carpenter to handle errors
        of read_tree_block ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      fd708b81
    • L
      Btrfs: remove nr_async_submits and async_submit_draining · 736cd52e
      Liu Bo 提交于
      Now that we have the combo of flushing twice, which can make sure IO
      have started since the second flush will wait for page lock which
      won't be unlocked unless setting page writeback and queuing ordered
      extents, we don't need %async_submit_draining, %async_delalloc_pages
      and %nr_async_submits to tell whether the IO has actually started.
      
      Moreover, all the flushers in use are followed by functions that wait
      for ordered extents to complete, so %nr_async_submits, which tracks
      whether bio's async submit has made progress, doesn't really make
      sense.
      
      However, %async_delalloc_pages is still required by shrink_delalloc()
      as that function doesn't flush twice in the normal case (just issues a
      writeback with WB_REASON_FS_FREE_SPACE).
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      736cd52e
    • L
      Btrfs: remove nr_async_bios · f851689b
      Liu Bo 提交于
      This was intended to congest higher layers to not send bios, but as
      
      1) the congested bit has been taken by writeback
      
      Async bios come from buffered writes and DIO writes.
      
      For DIO writes, we want to submit them ASAP, while for buffered writes,
      writeback uses balance_dirty_pages() to throttle how much dirty pages we
      can have.
      
      2) and no one is waiting for %nr_async_bios down to zero,
      
      Historically, it was introduced along with changes which let
      checksumming workload spread accross different cpus.  And at that time,
      pdflush was used instead of per-bdi flushing, perhaps pdflush did not
      have the necessary information for writeback to do throttling.
      
      We can safely remove them now.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      [ additional explanation from mails, removed unused variable 'limit' ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      f851689b
    • Q
      btrfs: Move leaf and node validation checker to tree-checker.c · 557ea5dd
      Qu Wenruo 提交于
      It's no doubt the comprehensive tree block checker will become larger,
      so moving them into their own files is quite reasonable.
      Signed-off-by: NQu Wenruo <quwenruo.btrfs@gmx.com>
      [ wording adjustments ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      557ea5dd
    • Q
      btrfs: Add checker for EXTENT_CSUM · 4b865cab
      Qu Wenruo 提交于
      EXTENT_CSUM checker is a relatively easy one, only needs to check:
      
      1) Objectid
         Fixed to BTRFS_EXTENT_CSUM_OBJECTID
      
      2) Key offset alignment
         Must be aligned to sectorsize
      
      3) Item size alignedment
         Must be aligned to csum size
      Signed-off-by: NQu Wenruo <quwenruo.btrfs@gmx.com>
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      4b865cab
    • Q
      btrfs: Add sanity check for EXTENT_DATA when reading out leaf · 40c3c409
      Qu Wenruo 提交于
      Add extra checks for item with EXTENT_DATA type.  This checks the
      following thing:
      
      0) Key offset
         All key offsets must be aligned to sectorsize.
         Inline extent must have 0 for key offset.
      
      1) Item size
         Uncompressed inline file extent size must match item size.
         (Compressed inline file extent has no information about its on-disk size.)
         Regular/preallocated file extent size must be a fixed value.
      
      2) Every member of regular file extent item
         Including alignment for bytenr and offset, possible value for
         compression/encryption/type.
      
      3) Type/compression/encode must be one of the valid values.
      
      This should be the most comprehensive and strict check in the context
      of btrfs_item for EXTENT_DATA.
      Signed-off-by: NQu Wenruo <quwenruo.btrfs@gmx.com>
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ switch to BTRFS_FILE_EXTENT_TYPES, similar to what
        BTRFS_COMPRESS_TYPES does ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      40c3c409
    • Q
      btrfs: Check if item pointer overlaps with the item itself · 7f43d4af
      Qu Wenruo 提交于
      Function check_leaf() checks if any item pointer points outside of the
      leaf, but it doesn't check if the pointer overlaps with the item itself.
      
      Normally only the last item may be the victim, but adding such check is
      never a bad idea anyway.
      Signed-off-by: NQu Wenruo <quwenruo.btrfs@gmx.com>
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      7f43d4af
    • Q
      btrfs: Refactor check_leaf function for later expansion · c3267bba
      Qu Wenruo 提交于
      Current check_leaf() function does a good job checking key order and
      item offset/size.
      
      However it only checks from slot 0 to the last but one slot, this is
      good but makes later expansion hard.
      
      So this refactoring iterates from slot 0 to the last slot.
      For key comparison, it uses a key with all 0 as initial key, so all
      valid keys should be larger than that.
      
      And for item size/offset checks, it compares current item end with
      previous item offset.
      For slot 0, use leaf end as a special case.
      
      This makes later item/key offset checks and item size checks easier to
      be implemented.
      
      Also, makes check_leaf() to return -EUCLEAN other than -EIO to indicate
      error.
      Signed-off-by: NQu Wenruo <quwenruo.btrfs@gmx.com>
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c3267bba
    • L
      Btrfs: remove bio_flags which indicates a meta block of log-tree · 18fdc679
      Liu Bo 提交于
      Since both committing transaction and writing log-tree are doing
      plugging on metadata IO, we can unify to use %sync_writers to benefit
      both cases, instead of checking bio_flags while writing meta blocks of
      log-tree.
      
      We can remove this bio_flags because in order to write dirty blocks,
      log tree also uses btrfs_write_marked_extents(), inside which we
      have enabled %sync_writers, therefore, every write goes in a
      synchronous way, so does checksuming.
      
      Please also note that, bio_flags is applied per-context while
      %sync_writers is applied per-inode, so this might incur some overhead, ie.
      
      1) while log tree is flushing its dirty blocks via
         btrfs_write_marked_extents(), in which %sync_writers is increased
         by one.
      
      2) in the meantime, some writeback operations may happen upon btrfs's
         metadata inode, so these writes go synchronously, too.
      
      However, AFAICS, the overhead is not a big one while the win is that
      we unify the two places that needs synchronous way and remove a
      special hack/flag.
      
      This removes the bio_flags related stuff for writing log-tree.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      18fdc679
    • L
      Btrfs: make plug in writing meta blocks really work · 6300463b
      Liu Bo 提交于
      We have started plug in btrfs_write_and_wait_marked_extents() but the
      generated IOs actually go to device's schedule IO list where the work
      is doing in another task, thus the started plug doesn't make any
      sense.
      
      And since we wait for IOs immediately after writing meta blocks, it's
      the same case as writing log tree, doing sync submit can merge more
      IOs.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6300463b
    • A
      btrfs: copy fsid to super_block s_uuid · ee87cf5e
      Anand Jain 提交于
      We didn't copy fsid to struct super_block.s_uuid so Overlay disables
      index feature with btrfs as the lower FS.
      
      kernel: overlayfs: fs on '/lower' does not support file handles, falling back to index=off.
      
      Fix this by publishing the fsid through struct super_block.s_uuid.
      
      [ dsterba: I think that setting s_uuid is the last missing bit. Overlay
        needs the file handle encoding support from the lower filesystem, which
        is supported. Filling the whole filesystem id is correct, the subvolume
        id is encoded in the file handle buffer from inside btrfs_encode_fh. ]
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ee87cf5e
  9. 26 9月, 2017 1 次提交
  10. 24 8月, 2017 2 次提交
    • O
      Btrfs: fix blk_status_t/errno confusion · 58efbc9f
      Omar Sandoval 提交于
      This fixes several instances of blk_status_t and bare errno ints being
      mixed up, some of which are real bugs.
      
      In the normal case, 0 matches BLK_STS_OK, so we don't observe any
      effects of the missing conversion, but in case of errors or passes
      through the repair/retry paths, the errors get mixed up.
      
      The changes were identified using 'sparse', we don't have reports of the
      buggy behaviour.
      
      Fixes: 4e4cbee9 ("block: switch bios to blk_status_t")
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      58efbc9f
    • C
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig 提交于
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      74d46992