1. 24 3月, 2013 15 次提交
    • K
      bio-integrity: Add explicit field for owner of bip_buf · 29ed7813
      Kent Overstreet 提交于
      This was the only real user of BIO_CLONED, which didn't have very clear
      semantics. Convert to its own flag so we can get rid of BIO_CLONED.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: Martin K. Petersen <martin.petersen@oracle.com>
      29ed7813
    • K
      block: Add an explicit bio flag for bios that own their bvec · a38352e0
      Kent Overstreet 提交于
      This is for the new bio splitting code. When we split a bio, if the
      split occured on a bvec boundry we reuse the bvec for the new bio. But
      that means bio_free() can't free it, hence the explicit flag.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      Acked-by: NTejun Heo <tj@kernel.org>
      a38352e0
    • K
      block: Add bio_alloc_pages() · a0787606
      Kent Overstreet 提交于
      More utility code to replace stuff that's getting open coded.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: NeilBrown <neilb@suse.de>
      a0787606
    • K
      block: Convert some code to bio_for_each_segment_all() · cb34e057
      Kent Overstreet 提交于
      More prep work for immutable bvecs:
      
      A few places in the code were either open coding or using the wrong
      version - fix.
      
      After we introduce the bvec iter, it'll no longer be possible to modify
      the biovec through bio_for_each_segment_all() - it doesn't increment a
      pointer to the current bvec, you pass in a struct bio_vec (not a
      pointer) which is updated with what the current biovec would be (taking
      into account bi_bvec_done and bi_size).
      
      So because of that it's more worthwhile to be consistent about
      bio_for_each_segment()/bio_for_each_segment_all() usage.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: NeilBrown <neilb@suse.de>
      CC: Alasdair Kergon <agk@redhat.com>
      CC: dm-devel@redhat.com
      CC: Alexander Viro <viro@zeniv.linux.org.uk>
      cb34e057
    • K
      block: Add bio_for_each_segment_all() · d74c6d51
      Kent Overstreet 提交于
      __bio_for_each_segment() iterates bvecs from the specified index
      instead of bio->bv_idx.  Currently, the only usage is to walk all the
      bvecs after the bio has been advanced by specifying 0 index.
      
      For immutable bvecs, we need to split these apart;
      bio_for_each_segment() is going to have a different implementation.
      This will also help document the intent of code that's using it -
      bio_for_each_segment_all() is only legal to use for code that owns the
      bio.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: Neil Brown <neilb@suse.de>
      CC: Boaz Harrosh <bharrosh@panasas.com>
      d74c6d51
    • K
      block: Add bio_copy_data() · 16ac3d63
      Kent Overstreet 提交于
      This gets open coded quite a bit and it's tricky to get right, so make a
      generic version and convert some existing users over to it instead.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      16ac3d63
    • K
      block: Add submit_bio_wait(), remove from md · 9e882242
      Kent Overstreet 提交于
      Random cleanup - this code was duplicated and it's not really specific
      to md.
      
      Also added the ability to return the actual error code.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: NeilBrown <neilb@suse.de>
      Acked-by: NTejun Heo <tj@kernel.org>
      9e882242
    • K
      block: Remove bi_idx references · 4f2ac93c
      Kent Overstreet 提交于
      For immutable bvecs, all bi_idx usage needs to be audited - so here
      we're removing all the unnecessary uses.
      
      Most of these are places where it was being initialized on a bio that
      was just allocated, a few others are conversions to standard macros.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      4f2ac93c
    • K
      block: Change bio_split() to respect the current value of bi_idx · 5b83636a
      Kent Overstreet 提交于
      In the current code bio_split() won't be seeing partially completed bios
      so this doesn't change any behaviour, but this makes the code a bit
      clearer as to what bio_split() actually requires.
      
      The immediate purpose of the patch is removing unnecessary bi_idx
      references, but the end goal is to allow partial completed bios to be
      submitted, which along with immutable biovecs enables effecient bio
      splitting.
      
      Some of the callers were (double) checking that bios could be split, so
      update their checks too.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: Lars Ellenberg <drbd-dev@lists.linbit.com>
      CC: Neil Brown <neilb@suse.de>
      CC: Martin K. Petersen <martin.petersen@oracle.com>
      5b83636a
    • K
      block: Use bio_sectors() more consistently · aa8b57aa
      Kent Overstreet 提交于
      Bunch of places in the code weren't using it where they could be -
      this'll reduce the size of the patch that puts bi_sector/bi_size/bi_idx
      into a struct bvec_iter.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: "Ed L. Cashin" <ecashin@coraid.com>
      CC: Nick Piggin <npiggin@kernel.dk>
      CC: Jiri Kosina <jkosina@suse.cz>
      CC: Jim Paris <jim@jtan.com>
      CC: Geoff Levand <geoff@infradead.org>
      CC: Alasdair Kergon <agk@redhat.com>
      CC: dm-devel@redhat.com
      CC: Neil Brown <neilb@suse.de>
      CC: Steven Rostedt <rostedt@goodmis.org>
      Acked-by: NEd Cashin <ecashin@coraid.com>
      aa8b57aa
    • K
      block: Add bio_end_sector() · f73a1c7d
      Kent Overstreet 提交于
      Just a little convenience macro - main reason to add it now is preparing
      for immutable bio vecs, it'll reduce the size of the patch that puts
      bi_sector/bi_size/bi_idx into a struct bvec_iter.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: Lars Ellenberg <drbd-dev@lists.linbit.com>
      CC: Jiri Kosina <jkosina@suse.cz>
      CC: Alasdair Kergon <agk@redhat.com>
      CC: dm-devel@redhat.com
      CC: Neil Brown <neilb@suse.de>
      CC: Martin Schwidefsky <schwidefsky@de.ibm.com>
      CC: Heiko Carstens <heiko.carstens@de.ibm.com>
      CC: linux-s390@vger.kernel.org
      CC: Chris Mason <chris.mason@fusionio.com>
      CC: Steven Whitehouse <swhiteho@redhat.com>
      Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
      f73a1c7d
    • K
      block: Add bio_advance() · 054bdf64
      Kent Overstreet 提交于
      This is prep work for immutable bio vecs; we first want to centralize
      where bvecs are modified.
      
      Next two patches convert some existing code to use this function.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      054bdf64
    • K
      block: Convert integrity to bvec_alloc_bs() · 9f060e22
      Kent Overstreet 提交于
      This adds a pointer to the bvec array to struct bio_integrity_payload,
      instead of the bvecs always being inline; then the bvecs are allocated
      with bvec_alloc_bs().
      
      Changed bvec_alloc_bs() and bvec_free_bs() to take a pointer to a
      mempool instead of the bioset, so that bio integrity can use a different
      mempool for its bvecs, and thus avoid a potential deadlock.
      
      This is eventually for immutable bio vecs - immutable bvecs aren't
      useful if we still have to copy them, hence the need for the pointer.
      Less code is always nice too, though.
      
      Also, bio_integrity_alloc() was using fs_bio_set if no bio_set was
      specified. This was wrong - using the bio_set doesn't protect us from
      memory allocation failures, because we just used kmalloc for the
      bio_integrity_payload. But it does introduce the possibility of
      deadlock, if for some reason we weren't supposed to be using fs_bio_set.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: Martin K. Petersen <martin.petersen@oracle.com>
      9f060e22
    • K
      block: Fix a buffer overrun in bio_integrity_split() · 6fda981c
      Kent Overstreet 提交于
      bio_integrity_split() seemed to be confusing pointers and arrays -
      bip_vec in bio_integrity_payload was an array appended to the end of the
      payload, so the bio_vecs in struct bio_pair should have come after the
      bio_integrity_payload they're for.
      
      Fix it by making bip_vec a pointer to the inline vecs - a later patch is
      going to make more use of this pointer.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: Martin K. Petersen <martin.petersen@oracle.com>
      6fda981c
    • K
      block: Avoid deadlocks with bio allocation by stacking drivers · df2cb6da
      Kent Overstreet 提交于
      Previously, if we ever try to allocate more than once from the same bio
      set while running under generic_make_request() (i.e. a stacking block
      driver), we risk deadlock.
      
      This is because of the code in generic_make_request() that converts
      recursion to iteration; any bios we submit won't actually be submitted
      (so they can complete and eventually be freed) until after we return -
      this means if we allocate a second bio, we're blocking the first one
      from ever being freed.
      
      Thus if enough threads call into a stacking block driver at the same
      time with bios that need multiple splits, and the bio_set's reserve gets
      used up, we deadlock.
      
      This can be worked around in the driver code - we could check if we're
      running under generic_make_request(), then mask out __GFP_WAIT when we
      go to allocate a bio, and if the allocation fails punt to workqueue and
      retry the allocation.
      
      But this is tricky and not a generic solution. This patch solves it for
      all users by inverting the previously described technique. We allocate a
      rescuer workqueue for each bio_set, and then in the allocation code if
      there are bios on current->bio_list we would be blocking, we punt them
      to the rescuer workqueue to be submitted.
      
      This guarantees forward progress for bio allocations under
      generic_make_request() provided each bio is submitted before allocating
      the next, and provided the bios are freed after they complete.
      
      Note that this doesn't do anything for allocation from other mempools.
      Instead of allocating per bio data structures from a mempool, code
      should use bio_set's front_pad.
      
      Tested it by forcing the rescue codepath to be taken (by disabling the
      first GFP_NOWAIT) attempt, and then ran it with bcache (which does a lot
      of arbitrary bio splitting) and verified that the rescuer was being
      invoked.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      Acked-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NMuthukumar Ratty <muthur@gmail.com>
      df2cb6da
  2. 16 3月, 2013 1 次提交
    • L
      Btrfs: fix warning of free_extent_map · 3b277594
      Liu Bo 提交于
      Users report that an extent map's list is still linked when it's actually
      going to be freed from cache.
      
      The story is that
      
      a) when we're going to drop an extent map and may split this large one into
      smaller ems, and if this large one is flagged as EXTENT_FLAG_LOGGING which means
      that it's on the list to be logged, then the smaller ems split from it will also
      be flagged as EXTENT_FLAG_LOGGING, and this is _not_ expected.
      
      b) we'll keep ems from unlinking the list and freeing when they are flagged with
      EXTENT_FLAG_LOGGING, because the log code holds one reference.
      
      The end result is the warning, but the truth is that we set the flag
      EXTENT_FLAG_LOGGING only during fsync.
      
      So clear flag EXTENT_FLAG_LOGGING for extent maps split from a large one.
      Reported-by: NJohannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
      Reported-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      3b277594
  3. 15 3月, 2013 6 次提交
  4. 14 3月, 2013 2 次提交
  5. 13 3月, 2013 3 次提交
    • J
      ext2: Fix BUG_ON in evict() on inode deletion · c288d296
      Jan Kara 提交于
      Commit 8e3dffc6 introduced a regression where deleting inode with
      large extended attributes leads to triggering
        BUG_ON(inode->i_state != (I_FREEING | I_CLEAR))
      in fs/inode.c:evict(). That happens because freeing of xattr block
      dirtied the inode and it happened after clear_inode() has been called.
      
      Fix the issue by moving removal of xattr block into ext2_evict_inode()
      before clear_inode() call close to a place where data blocks are
      truncated. That is also more logical place and removes surprising
      requirement that ext2_free_blocks() mustn't dirty the inode.
      Reported-by: NTyler Hicks <tyhicks@canonical.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      c288d296
    • E
      fs: Readd the fs module aliases. · fa7614dd
      Eric W. Biederman 提交于
      I had assumed that the only use of module aliases for filesystems
      prior to "fs: Limit sys_mount to only request filesystem modules."
      was in request_module.  It turns out I was wrong.  At least mkinitcpio
      in Arch linux uses these aliases.
      
      So readd the preexising aliases, to keep from breaking userspace.
      
      Userspace eventually will have to follow and use the same aliases the
      kernel does.  So at some point we may be delete these aliases without
      problems.  However that day is not today.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      fa7614dd
    • M
      Fix: compat_rw_copy_check_uvector() misuse in aio, readv, writev, and security keys · 8aec0f5d
      Mathieu Desnoyers 提交于
      Looking at mm/process_vm_access.c:process_vm_rw() and comparing it to
      compat_process_vm_rw() shows that the compatibility code requires an
      explicit "access_ok()" check before calling
      compat_rw_copy_check_uvector(). The same difference seems to appear when
      we compare fs/read_write.c:do_readv_writev() to
      fs/compat.c:compat_do_readv_writev().
      
      This subtle difference between the compat and non-compat requirements
      should probably be debated, as it seems to be error-prone. In fact,
      there are two others sites that use this function in the Linux kernel,
      and they both seem to get it wrong:
      
      Now shifting our attention to fs/aio.c, we see that aio_setup_iocb()
      also ends up calling compat_rw_copy_check_uvector() through
      aio_setup_vectored_rw(). Unfortunately, the access_ok() check appears to
      be missing. Same situation for
      security/keys/compat.c:compat_keyctl_instantiate_key_iov().
      
      I propose that we add the access_ok() check directly into
      compat_rw_copy_check_uvector(), so callers don't have to worry about it,
      and it therefore makes the compat call code similar to its non-compat
      counterpart. Place the access_ok() check in the same location where
      copy_from_user() can trigger a -EFAULT error in the non-compat code, so
      the ABI behaviors are alike on both compat and non-compat.
      
      While we are here, fix compat_do_readv_writev() so it checks for
      compat_rw_copy_check_uvector() negative return values.
      
      And also, fix a memory leak in compat_keyctl_instantiate_key_iov() error
      handling.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8aec0f5d
  6. 12 3月, 2013 4 次提交
  7. 11 3月, 2013 2 次提交
  8. 09 3月, 2013 2 次提交
  9. 07 3月, 2013 5 次提交