1. 26 2月, 2011 5 次提交
    • J
      aio: fix race between io_destroy() and io_submit() · 7137c6bd
      Jan Kara 提交于
      A race can occur when io_submit() races with io_destroy():
      
       CPU1						CPU2
      io_submit()
        do_io_submit()
          ...
          ctx = lookup_ioctx(ctx_id);
      						io_destroy()
          Now do_io_submit() holds the last reference to ctx.
          ...
          queue new AIO
          put_ioctx(ctx) - frees ctx with active AIOs
      
      We solve this issue by checking whether ctx is being destroyed in AIO
      submission path after adding new AIO to ctx.  Then we are guaranteed that
      either io_destroy() waits for new AIO or we see that ctx is being
      destroyed and bail out.
      
      Cc: Nick Piggin <npiggin@kernel.dk>
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7137c6bd
    • N
      aio: fix rcu ioctx lookup · 3bd9a5d7
      Nick Piggin 提交于
      aio-dio-invalidate-failure GPFs in aio_put_req from io_submit.
      
      lookup_ioctx doesn't implement the rcu lookup pattern properly.
      rcu_read_lock does not prevent refcount going to zero, so we might take
      a refcount on a zero count ioctx.
      
      Fix the bug by atomically testing for zero refcount before incrementing.
      
      [jack@suse.cz: added comment into the code]
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3bd9a5d7
    • T
      ldm: corrupted partition table can cause kernel oops · 294f6cf4
      Timo Warns 提交于
      The kernel automatically evaluates partition tables of storage devices.
      The code for evaluating LDM partitions (in fs/partitions/ldm.c) contains
      a bug that causes a kernel oops on certain corrupted LDM partitions.  A
      kernel subsystem seems to crash, because, after the oops, the kernel no
      longer recognizes newly connected storage devices.
      
      The patch changes ldm_parse_vmdb() to Validate the value of vblk_size.
      Signed-off-by: NTimo Warns <warns@pre-sense.de>
      Cc: Eugene Teo <eugeneteo@kernel.sg>
      Acked-by: NRichard Russon <ldm@flatcap.org>
      Cc: Harvey Harrison <harvey.harrison@gmail.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      294f6cf4
    • D
      epoll: prevent creating circular epoll structures · 22bacca4
      Davide Libenzi 提交于
      In several places, an epoll fd can call another file's ->f_op->poll()
      method with ep->mtx held.  This is in general unsafe, because that other
      file could itself be an epoll fd that contains the original epoll fd.
      
      The code defends against this possibility in its own ->poll() method using
      ep_call_nested, but there are several other unsafe calls to ->poll
      elsewhere that can be made to deadlock.  For example, the following simple
      program causes the call in ep_insert recursively call the original fd's
      ->poll, leading to deadlock:
      
       #include <unistd.h>
       #include <sys/epoll.h>
      
       int main(void) {
           int e1, e2, p[2];
           struct epoll_event evt = {
               .events = EPOLLIN
           };
      
           e1 = epoll_create(1);
           e2 = epoll_create(2);
           pipe(p);
      
           epoll_ctl(e2, EPOLL_CTL_ADD, e1, &evt);
           epoll_ctl(e1, EPOLL_CTL_ADD, p[0], &evt);
           write(p[1], p, sizeof p);
           epoll_ctl(e1, EPOLL_CTL_ADD, e2, &evt);
      
           return 0;
       }
      
      On insertion, check whether the inserted file is itself a struct epoll,
      and if so, do a recursive walk to detect whether inserting this file would
      create a loop of epoll structures, which could lead to deadlock.
      
      [nelhage@ksplice.com: Use epmutex to serialize concurrent inserts]
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NNelson Elhage <nelhage@ksplice.com>
      Reported-by: NNelson Elhage <nelhage@ksplice.com>
      Tested-by: NNelson Elhage <nelhage@ksplice.com>
      Cc: <stable@kernel.org>		[2.6.34+, possibly earlier]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      22bacca4
    • A
      afs: Fix oops in afs_unlink_writeback · f129ccc9
      Anton Blanchard 提交于
      I'm seeing the following oops when testing afs:
      
        Unable to handle kernel paging request for data at address 0x00000008
        ...
        NIP [c0000000003393b0] .afs_unlink_writeback+0x38/0xc0
        LR [c00000000033987c] .afs_put_writeback+0x98/0xec
        Call Trace:
        [c00000000345f600] [c00000000033987c] .afs_put_writeback+0x98/0xec
        [c00000000345f690] [c00000000033ae80] .afs_write_begin+0x6a4/0x75c
        [c00000000345f790] [c00000000012b77c] .generic_file_buffered_write+0x148/0x320
        [c00000000345f8d0] [c00000000012e1b8] .__generic_file_aio_write+0x37c/0x3e4
        [c00000000345f9d0] [c00000000012e2a8] .generic_file_aio_write+0x88/0xfc
        [c00000000345fa90] [c0000000003390a8] .afs_file_write+0x10c/0x178
        [c00000000345fb40] [c000000000188788] .do_sync_write+0xc4/0x128
        [c00000000345fcc0] [c000000000189658] .vfs_write+0xe8/0x1d8
        [c00000000345fd70] [c000000000189884] .SyS_write+0x68/0xb0
        [c00000000345fe30] [c000000000008564] syscall_exit+0x0/0x40
      
      afs_write_begin hits an error and calls afs_unlink_writeback. In there
      we do list_del_init on an uninitialised list.
      
      The patch below initialises ->link when creating the afs_writeback struct.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f129ccc9
  2. 25 2月, 2011 1 次提交
  3. 24 2月, 2011 4 次提交
    • J
      Unlock vfsmount_lock in do_umount · bf9faa2a
      J. R. Okajima 提交于
      By the commit
      	b3e19d92 2011-01-07 fs: scale mntget/mntput
      vfsmount_lock was introduced around testing mnt_count.
      Fix the mis-typed 'unlock'
      Signed-off-by: NJ. R. Okajima <hooanon05@yahoo.co.jp>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      bf9faa2a
    • N
      Fix over-zealous flush_disk when changing device size. · 93b270f7
      NeilBrown 提交于
      There are two cases when we call flush_disk.
      In one, the device has disappeared (check_disk_change) so any
      data will hold becomes irrelevant.
      In the oter, the device has changed size (check_disk_size_change)
      so data we hold may be irrelevant.
      
      In both cases it makes sense to discard any 'clean' buffers,
      so they will be read back from the device if needed.
      
      In the former case it makes sense to discard 'dirty' buffers
      as there will never be anywhere safe to write the data.  In the
      second case it *does*not* make sense to discard dirty buffers
      as that will lead to file system corruption when you simply enlarge
      the containing devices.
      
      flush_disk calls __invalidate_devices.
      __invalidate_device calls both invalidate_inodes and invalidate_bdev.
      
      invalidate_inodes *does* discard I_DIRTY inodes and this does lead
      to fs corruption.
      
      invalidate_bev *does*not* discard dirty pages, but I don't really care
      about that at present.
      
      So this patch adds a flag to __invalidate_device (calling it
      __invalidate_device2) to indicate whether dirty buffers should be
      killed, and this is passed to invalidate_inodes which can choose to
      skip dirty inodes.
      
      flusk_disk then passes true from check_disk_change and false from
      check_disk_size_change.
      
      dm avoids tripping over this problem by calling i_size_write directly
      rathher than using check_disk_size_change.
      
      md does use check_disk_size_change and so is affected.
      
      This regression was introduced by commit 608aeef1 which causes
      check_disk_size_change to call flush_disk, so it is suitable for any
      kernel since 2.6.27.
      
      Cc: stable@kernel.org
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: Andrew Patterson <andrew.patterson@hp.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      93b270f7
    • M
      mm: prevent concurrent unmap_mapping_range() on the same inode · 2aa15890
      Miklos Szeredi 提交于
      Michael Leun reported that running parallel opens on a fuse filesystem
      can trigger a "kernel BUG at mm/truncate.c:475"
      
      Gurudas Pai reported the same bug on NFS.
      
      The reason is, unmap_mapping_range() is not prepared for more than
      one concurrent invocation per inode.  For example:
      
        thread1: going through a big range, stops in the middle of a vma and
           stores the restart address in vm_truncate_count.
      
        thread2: comes in with a small (e.g. single page) unmap request on
           the same vma, somewhere before restart_address, finds that the
           vma was already unmapped up to the restart address and happily
           returns without doing anything.
      
      Another scenario would be two big unmap requests, both having to
      restart the unmapping and each one setting vm_truncate_count to its
      own value.  This could go on forever without any of them being able to
      finish.
      
      Truncate and hole punching already serialize with i_mutex.  Other
      callers of unmap_mapping_range() do not, and it's difficult to get
      i_mutex protection for all callers.  In particular ->d_revalidate(),
      which calls invalidate_inode_pages2_range() in fuse, may be called
      with or without i_mutex.
      
      This patch adds a new mutex to 'struct address_space' to prevent
      running multiple concurrent unmap_mapping_range() on the same mapping.
      
      [ We'll hopefully get rid of all this with the upcoming mm
        preemptibility series by Peter Zijlstra, the "mm: Remove i_mmap_mutex
        lockbreak" patch in particular.  But that is for 2.6.39 ]
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Reported-by: NMichael Leun <lkml20101129@newton.leun.net>
      Reported-by: NGurudas Pai <gurudas.pai@oracle.com>
      Tested-by: NGurudas Pai <gurudas.pai@oracle.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2aa15890
    • C
      Btrfs: fix fiemap bugs with delalloc · ec29ed5b
      Chris Mason 提交于
      The Btrfs fiemap code wasn't properly returning delalloc extents,
      so applications that trust fiemap to decide if there are holes in the
      file see holes instead of delalloc.
      
      This reworks the btrfs fiemap code, adding a get_extent helper that
      searches for delalloc ranges and also adding a helper for extent_fiemap
      that skips past holes in the file.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      ec29ed5b
  4. 23 2月, 2011 2 次提交
  5. 22 2月, 2011 6 次提交
  6. 20 2月, 2011 4 次提交
  7. 18 2月, 2011 2 次提交
  8. 17 2月, 2011 11 次提交
  9. 15 2月, 2011 5 次提交