1. 24 3月, 2013 3 次提交
    • K
      block: Convert integrity to bvec_alloc_bs() · 9f060e22
      Kent Overstreet 提交于
      This adds a pointer to the bvec array to struct bio_integrity_payload,
      instead of the bvecs always being inline; then the bvecs are allocated
      with bvec_alloc_bs().
      
      Changed bvec_alloc_bs() and bvec_free_bs() to take a pointer to a
      mempool instead of the bioset, so that bio integrity can use a different
      mempool for its bvecs, and thus avoid a potential deadlock.
      
      This is eventually for immutable bio vecs - immutable bvecs aren't
      useful if we still have to copy them, hence the need for the pointer.
      Less code is always nice too, though.
      
      Also, bio_integrity_alloc() was using fs_bio_set if no bio_set was
      specified. This was wrong - using the bio_set doesn't protect us from
      memory allocation failures, because we just used kmalloc for the
      bio_integrity_payload. But it does introduce the possibility of
      deadlock, if for some reason we weren't supposed to be using fs_bio_set.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: Martin K. Petersen <martin.petersen@oracle.com>
      9f060e22
    • K
      block: Fix a buffer overrun in bio_integrity_split() · 6fda981c
      Kent Overstreet 提交于
      bio_integrity_split() seemed to be confusing pointers and arrays -
      bip_vec in bio_integrity_payload was an array appended to the end of the
      payload, so the bio_vecs in struct bio_pair should have come after the
      bio_integrity_payload they're for.
      
      Fix it by making bip_vec a pointer to the inline vecs - a later patch is
      going to make more use of this pointer.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: Martin K. Petersen <martin.petersen@oracle.com>
      6fda981c
    • K
      block: Avoid deadlocks with bio allocation by stacking drivers · df2cb6da
      Kent Overstreet 提交于
      Previously, if we ever try to allocate more than once from the same bio
      set while running under generic_make_request() (i.e. a stacking block
      driver), we risk deadlock.
      
      This is because of the code in generic_make_request() that converts
      recursion to iteration; any bios we submit won't actually be submitted
      (so they can complete and eventually be freed) until after we return -
      this means if we allocate a second bio, we're blocking the first one
      from ever being freed.
      
      Thus if enough threads call into a stacking block driver at the same
      time with bios that need multiple splits, and the bio_set's reserve gets
      used up, we deadlock.
      
      This can be worked around in the driver code - we could check if we're
      running under generic_make_request(), then mask out __GFP_WAIT when we
      go to allocate a bio, and if the allocation fails punt to workqueue and
      retry the allocation.
      
      But this is tricky and not a generic solution. This patch solves it for
      all users by inverting the previously described technique. We allocate a
      rescuer workqueue for each bio_set, and then in the allocation code if
      there are bios on current->bio_list we would be blocking, we punt them
      to the rescuer workqueue to be submitted.
      
      This guarantees forward progress for bio allocations under
      generic_make_request() provided each bio is submitted before allocating
      the next, and provided the bios are freed after they complete.
      
      Note that this doesn't do anything for allocation from other mempools.
      Instead of allocating per bio data structures from a mempool, code
      should use bio_set's front_pad.
      
      Tested it by forcing the rescue codepath to be taken (by disabling the
      first GFP_NOWAIT) attempt, and then ran it with bcache (which does a lot
      of arbitrary bio splitting) and verified that the rescuer was being
      invoked.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      Acked-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NMuthukumar Ratty <muthur@gmail.com>
      df2cb6da
  2. 16 3月, 2013 1 次提交
    • L
      Btrfs: fix warning of free_extent_map · 3b277594
      Liu Bo 提交于
      Users report that an extent map's list is still linked when it's actually
      going to be freed from cache.
      
      The story is that
      
      a) when we're going to drop an extent map and may split this large one into
      smaller ems, and if this large one is flagged as EXTENT_FLAG_LOGGING which means
      that it's on the list to be logged, then the smaller ems split from it will also
      be flagged as EXTENT_FLAG_LOGGING, and this is _not_ expected.
      
      b) we'll keep ems from unlinking the list and freeing when they are flagged with
      EXTENT_FLAG_LOGGING, because the log code holds one reference.
      
      The end result is the warning, but the truth is that we set the flag
      EXTENT_FLAG_LOGGING only during fsync.
      
      So clear flag EXTENT_FLAG_LOGGING for extent maps split from a large one.
      Reported-by: NJohannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
      Reported-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      3b277594
  3. 15 3月, 2013 6 次提交
  4. 14 3月, 2013 2 次提交
  5. 13 3月, 2013 3 次提交
    • J
      ext2: Fix BUG_ON in evict() on inode deletion · c288d296
      Jan Kara 提交于
      Commit 8e3dffc6 introduced a regression where deleting inode with
      large extended attributes leads to triggering
        BUG_ON(inode->i_state != (I_FREEING | I_CLEAR))
      in fs/inode.c:evict(). That happens because freeing of xattr block
      dirtied the inode and it happened after clear_inode() has been called.
      
      Fix the issue by moving removal of xattr block into ext2_evict_inode()
      before clear_inode() call close to a place where data blocks are
      truncated. That is also more logical place and removes surprising
      requirement that ext2_free_blocks() mustn't dirty the inode.
      Reported-by: NTyler Hicks <tyhicks@canonical.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      c288d296
    • E
      fs: Readd the fs module aliases. · fa7614dd
      Eric W. Biederman 提交于
      I had assumed that the only use of module aliases for filesystems
      prior to "fs: Limit sys_mount to only request filesystem modules."
      was in request_module.  It turns out I was wrong.  At least mkinitcpio
      in Arch linux uses these aliases.
      
      So readd the preexising aliases, to keep from breaking userspace.
      
      Userspace eventually will have to follow and use the same aliases the
      kernel does.  So at some point we may be delete these aliases without
      problems.  However that day is not today.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      fa7614dd
    • M
      Fix: compat_rw_copy_check_uvector() misuse in aio, readv, writev, and security keys · 8aec0f5d
      Mathieu Desnoyers 提交于
      Looking at mm/process_vm_access.c:process_vm_rw() and comparing it to
      compat_process_vm_rw() shows that the compatibility code requires an
      explicit "access_ok()" check before calling
      compat_rw_copy_check_uvector(). The same difference seems to appear when
      we compare fs/read_write.c:do_readv_writev() to
      fs/compat.c:compat_do_readv_writev().
      
      This subtle difference between the compat and non-compat requirements
      should probably be debated, as it seems to be error-prone. In fact,
      there are two others sites that use this function in the Linux kernel,
      and they both seem to get it wrong:
      
      Now shifting our attention to fs/aio.c, we see that aio_setup_iocb()
      also ends up calling compat_rw_copy_check_uvector() through
      aio_setup_vectored_rw(). Unfortunately, the access_ok() check appears to
      be missing. Same situation for
      security/keys/compat.c:compat_keyctl_instantiate_key_iov().
      
      I propose that we add the access_ok() check directly into
      compat_rw_copy_check_uvector(), so callers don't have to worry about it,
      and it therefore makes the compat call code similar to its non-compat
      counterpart. Place the access_ok() check in the same location where
      copy_from_user() can trigger a -EFAULT error in the non-compat code, so
      the ABI behaviors are alike on both compat and non-compat.
      
      While we are here, fix compat_do_readv_writev() so it checks for
      compat_rw_copy_check_uvector() negative return values.
      
      And also, fix a memory leak in compat_keyctl_instantiate_key_iov() error
      handling.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8aec0f5d
  6. 12 3月, 2013 4 次提交
  7. 11 3月, 2013 2 次提交
  8. 09 3月, 2013 2 次提交
  9. 07 3月, 2013 8 次提交
  10. 06 3月, 2013 1 次提交
  11. 05 3月, 2013 8 次提交