1. 12 2月, 2011 9 次提交
    • M
      mlock: do not munlock pages in __do_fault() · 419d8c96
      Michel Lespinasse 提交于
      If the page is going to be written to, __do_page needs to break COW.
      
      However, the old page (before breaking COW) was never mapped mapped into
      the current pte (__do_fault is only called when the pte is not present),
      so vmscan can't have marked the old page as PageMlocked due to being
      mapped in __do_fault's VMA.  Therefore, __do_fault() does not need to
      worry about clearing PageMlocked() on the old page.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      419d8c96
    • M
      mlock: fix race when munlocking pages in do_wp_page() · e15f8c01
      Michel Lespinasse 提交于
      vmscan can lazily find pages that are mapped within VM_LOCKED vmas, and
      set the PageMlocked bit on these pages, transfering them onto the
      unevictable list.  When do_wp_page() breaks COW within a VM_LOCKED vma,
      it may need to clear PageMlocked on the old page and set it on the new
      page instead.
      
      This change fixes an issue where do_wp_page() was clearing PageMlocked
      on the old page while the pte was still pointing to it (as well as
      rmap).  Therefore, we were not protected against vmscan immediately
      transfering the old page back onto the unevictable list.  This could
      cause pages to get stranded there forever.
      
      I propose to move the corresponding code to the end of do_wp_page(),
      after the pte (and rmap) have been pointed to the new page.
      Additionally, we can use munlock_vma_page() instead of
      clear_page_mlock(), so that the old page stays mlocked if there are
      still other VM_LOCKED vmas mapping it.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e15f8c01
    • Y
      memblock: don't adjust size in memblock_find_base() · e6d2e2b2
      Yinghai Lu 提交于
      While applying patch to use memblock to find aperture for 64bit x86.
      Ingo found system with 1g + force_iommu
      
      > No AGP bridge found
      > Node 0: aperture @ 38000000 size 32 MB
      > Aperture pointing to e820 RAM. Ignoring.
      > Your BIOS doesn't leave a aperture memory hole
      > Please enable the IOMMU option in the BIOS setup
      > This costs you 64 MB of RAM
      > Cannot allocate aperture memory hole (0,65536K)
      
      the corresponding code:
      
      	addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20);
      	if (addr == MEMBLOCK_ERROR || addr + aper_size > 0xffffffff) {
      		printk(KERN_ERR
      			"Cannot allocate aperture memory hole (%lx,%uK)\n",
      				addr, aper_size>>10);
      		return 0;
      	}
      	memblock_x86_reserve_range(addr, addr + aper_size, "aperture64")
      
      fails because memblock core code align the size with 512M.  That could
      make size way too big.
      
      So don't align the size in that case.
      
      actually __memblock_alloc_base, the another caller already align that
      before calling that function.
      
      BTW. x86 does not use __memblock_alloc_base...
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David Miller <davem@davemloft.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Dave Airlie <airlied@linux.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e6d2e2b2
    • S
      nbd: remove module-level ioctl mutex · de1f016f
      Soren Hansen 提交于
      Commit 2a48fc0a ("block: autoconvert trivial BKL users to private
      mutex") replaced uses of the BKL in the nbd driver with mutex
      operations.  Since then, I've been been seeing these lock ups:
      
       INFO: task qemu-nbd:16115 blocked for more than 120 seconds.
       "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
       qemu-nbd      D 0000000000000001     0 16115  16114 0x00000004
        ffff88007d775d98 0000000000000082 ffff88007d775fd8 ffff88007d774000
        0000000000013a80 ffff8800020347e0 ffff88007d775fd8 0000000000013a80
        ffff880133730000 ffff880002034440 ffffea0004333db8 ffffffffa071c020
       Call Trace:
        [<ffffffff815b9997>] __mutex_lock_slowpath+0xf7/0x180
        [<ffffffff815b93eb>] mutex_lock+0x2b/0x50
        [<ffffffffa071a21c>] nbd_ioctl+0x6c/0x1c0 [nbd]
        [<ffffffff812cb970>] blkdev_ioctl+0x230/0x730
        [<ffffffff811967a1>] block_ioctl+0x41/0x50
        [<ffffffff81175c03>] do_vfs_ioctl+0x93/0x370
        [<ffffffff81175f61>] sys_ioctl+0x81/0xa0
        [<ffffffff8100c0c2>] system_call_fastpath+0x16/0x1b
      
      Instrumenting the nbd module's ioctl handler with some extra logging
      clearly shows the NBD_DO_IT ioctl being invoked which is a long-lived
      ioctl in the sense that it doesn't return until another ioctl asks the
      driver to disconnect.  However, that other ioctl blocks, waiting for the
      module-level mutex that replaced the BKL, and then we're stuck.
      
      This patch removes the module-level mutex altogether.  It's clearly
      wrong, and as far as I can see, it's entirely unnecessary, since the nbd
      driver maintains per-device mutexes, and I don't see anything that would
      require a module-level (or kernel-level, for that matter) mutex.
      Signed-off-by: NSoren Hansen <soren@linux2go.dk>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: NPaul Clements <paul.clements@steeleye.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: <stable@kernel.org>		[2.6.37.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      de1f016f
    • A
      drivers/rtc/rtc-proc.c: add module_put on error path in rtc_proc_open() · 24a6f5b8
      Alexander Strakh 提交于
      In file drivers/rtc/rtc-proc.c seq_open() can return -ENOMEM.
      
       86        if (!try_module_get(THIS_MODULE))
       87                return -ENODEV;
       88
       89        return single_open(file, rtc_proc_show, rtc);
      
      In this case before exiting (line 89) from rtc_proc_open the
      module_put(THIS_MODULE) must be called.
      
      Found by Linux Device Drivers Verification Project
      Signed-off-by: NAlexander Strakh <strakh@ispras.ru>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      24a6f5b8
    • R
      drivers/gpio/pca953x.c: add a mutex to fix race condition · 6e20fb18
      Roland Stigge 提交于
      Add a mutex to register communication and handling.  Without the mutex,
      GPIOs didn't switch as expected when toggled in a fast sequence of
      status changes of multiple outputs.
      Signed-off-by: NRoland Stigge <stigge@antcom.de>
      Acked-by: NEric Miao <eric.y.miao@gmail.com>
      Cc: Grant Likely <grant.likely@secretlab.ca>
      Cc: Marc Zyngier <maz@misterjones.org>
      Cc: Ben Gardner <bgardner@wabtec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6e20fb18
    • T
      ptrace: use safer wake up on ptrace_detach() · 01e05e9a
      Tejun Heo 提交于
      The wake_up_process() call in ptrace_detach() is spurious and not
      interlocked with the tracee state.  IOW, the tracee could be running or
      sleeping in any place in the kernel by the time wake_up_process() is
      called.  This can lead to the tracee waking up unexpectedly which can be
      dangerous.
      
      The wake_up is spurious and should be removed but for now reduce its
      toxicity by only waking up if the tracee is in TRACED or STOPPED state.
      
      This bug can possibly be used as an attack vector.  I don't think it
      will take too much effort to come up with an attack which triggers oops
      somewhere.  Most sleeps are wrapped in condition test loops and should
      be safe but we have quite a number of places where sleep and wakeup
      conditions are expected to be interlocked.  Although the window of
      opportunity is tiny, ptrace can be used by non-privileged users and with
      some loading the window can definitely be extended and exploited.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NRoland McGrath <roland@redhat.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      01e05e9a
    • B
      vfs: call rcu_barrier after ->kill_sb() · d863b50a
      Boaz Harrosh 提交于
      In commit fa0d7e3d ("fs: icache RCU free inodes"), we use rcu free
      inode instead of freeing the inode directly.  It causes a crash when we
      rmmod immediately after we umount the volume[1].
      
      So we need to call rcu_barrier after we kill_sb so that the inode is
      freed before we do rmmod.  The idea is inspired by Aneesh Kumar.
      rcu_barrier will wait for all callbacks to end before preceding.  The
      original patch was done by Tao Ma, but synchronize_rcu() is not enough
      here.
      
      1. http://marc.info/?l=linux-fsdevel&m=129680863330185&w=2Tested-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Chris Mason <chris.mason@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d863b50a
    • L
      Fix possible filp_cachep memory corruption · 2dab5974
      Linus Torvalds 提交于
      In commit 31e6b01f ("fs: rcu-walk for path lookup") we started doing
      path lookup using RCU, which then falls back to a careful non-RCU lookup
      in case of problems (LOOKUP_REVAL).  So do_filp_open() has this "re-do
      the lookup carefully" looping case.
      
      However, that means that we must not release the open-intent file data
      if we are going to loop around and use it once more!
      
      Fix this by moving the release of the open-intent data to the function
      that allocates it (do_filp_open() itself) rather than the helper
      functions that can get called multiple times (finish_open() and
      do_last()).  This makes the logic for the lifetime of that field much
      more obvious, and avoids the possible double free.
      Reported-by: NJ. R. Okajima <hooanon05@yahoo.co.jp>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2dab5974
  2. 11 2月, 2011 8 次提交
  3. 10 2月, 2011 13 次提交
  4. 09 2月, 2011 10 次提交