1. 09 2月, 2010 2 次提交
  2. 08 2月, 2010 1 次提交
    • L
      Fix race in tty_fasync() properly · 80e1e823
      Linus Torvalds 提交于
      This reverts commit 70362511 ("tty: fix race in tty_fasync") and
      commit b04da8bf ("fnctl: f_modown should call write_lock_irqsave/
      restore") that tried to fix up some of the fallout but was incomplete.
      
      It turns out that we really cannot hold 'tty->ctrl_lock' over calling
      __f_setown, because not only did that cause problems with interrupt
      disables (which the second commit fixed), it also causes a potential
      ABBA deadlock due to lock ordering.
      
      Thanks to Tetsuo Handa for following up on the issue, and running
      lockdep to show the problem.  It goes roughly like this:
      
       - f_getown gets filp->f_owner.lock for reading without interrupts
         disabled, so an interrupt that happens while that lock is held can
         cause a lockdep chain from f_owner.lock -> sighand->siglock.
      
       - at the same time, the tty->ctrl_lock -> f_owner.lock chain that
         commit 70362511 introduced, together with the pre-existing
         sighand->siglock -> tty->ctrl_lock chain means that we have a lock
         dependency the other way too.
      
      So instead of extending tty->ctrl_lock over the whole __f_setown() call,
      we now just take a reference to the 'pid' structure while holding the
      lock, and then release it after having done the __f_setown.  That still
      guarantees that 'struct pid' won't go away from under us, which is all
      we really ever needed.
      Reported-and-tested-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
      Acked-by: NAmérico Wang <xiyou.wangcong@gmail.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      80e1e823
  3. 07 2月, 2010 6 次提交
  4. 05 2月, 2010 6 次提交
  5. 03 2月, 2010 7 次提交
    • T
      NFS: Don't clobber the attribute type in nfs_update_inode() · 9b4b3513
      Trond Myklebust 提交于
      If the NFS_ATTR_FATTR_TYPE field isn't set in fattr->valid, then we should
      not set the S_IFMT part of inode->i_mode.
      Reported-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      9b4b3513
    • T
      NFS: Fix a umount race · 387c149b
      Trond Myklebust 提交于
      Ensure that we unregister the bdi before kill_anon_super() calls
      ida_remove() on our device name.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Cc: stable@kernel.org
      387c149b
    • T
      NFS: Fix an Oops when truncating a file · 9f557cd8
      Trond Myklebust 提交于
      The VM/VFS does not allow mapping->a_ops->invalidatepage() to fail.
      Unfortunately, nfs_wb_page_cancel() may fail if a fatal signal occurs.
      Since the NFS code assumes that the page stays mapped for as long as the
      writeback is active, we can end up Oopsing (among other things).
      
      The only safe fix here is to convert nfs_wait_on_request(), so as to make
      it uninterruptible (as is already the case with wait_on_page_writeback()).
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Cc: stable@kernel.org
      9f557cd8
    • S
      GFS2: Extend umount wait coverage to full glock lifetime · 8f05228e
      Steven Whitehouse 提交于
      Although all glocks are, by the time of the umount glock wait,
      scheduled for demotion, some of them haven't made it far
      enough through the process for the original set of waiting
      code to wait for them.
      
      This extends the ref count to the whole glock lifetime in order
      to ensure that the waiting does catch all glocks. It does make
      it a bit more invasive, but it seems the only sensible solution
      at the moment.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      8f05228e
    • S
      GFS2: Wait for unlock completion on umount · e402746a
      Steven Whitehouse 提交于
      This patch adds a wait on umount between the point at which we
      dispose of all glocks and the point at which we unmount the
      lock protocol. This ensures that we've received all the replies
      to our unlock requests before we stop the locking.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Reported-by: NFabio M. Di Nitto <fdinitto@redhat.com>
      e402746a
    • A
      mm: flush dcache before writing into page to avoid alias · 931e80e4
      anfei zhou 提交于
      The cache alias problem will happen if the changes of user shared mapping
      is not flushed before copying, then user and kernel mapping may be mapped
      into two different cache line, it is impossible to guarantee the coherence
      after iov_iter_copy_from_user_atomic.  So the right steps should be:
      
      	flush_dcache_page(page);
      	kmap_atomic(page);
      	write to page;
      	kunmap_atomic(page);
      	flush_dcache_page(page);
      
      More precisely, we might create two new APIs flush_dcache_user_page and
      flush_dcache_kern_page to replace the two flush_dcache_page accordingly.
      
      Here is a snippet tested on omap2430 with VIPT cache, and I think it is
      not ARM-specific:
      
      	int val = 0x11111111;
      	fd = open("abc", O_RDWR);
      	addr = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
      	*(addr+0) = 0x44444444;
      	tmp = *(addr+0);
      	*(addr+1) = 0x77777777;
      	write(fd, &val, sizeof(int));
      	close(fd);
      
      The results are not always 0x11111111 0x77777777 at the beginning as expected.  Sometimes we see 0x44444444 0x77777777.
      Signed-off-by: NAnfei <anfei.zhou@gmail.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: <linux-arch@vger.kernel.org>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      931e80e4
    • L
      Fix 'flush_old_exec()/setup_new_exec()' split · 7ab02af4
      Linus Torvalds 提交于
      Commit 221af7f8 ("Split 'flush_old_exec' into two functions") split
      the function at the point of no return - ie right where there were no
      more error cases to check.  That made sense from a technical standpoint,
      but when we then also combined it with the actual personality setting
      going in between flush_old_exec() and setup_new_exec(), it needs to be a
      bit more careful.
      
      In particular, we need to make sure that we really flush the old
      personality bits in the 'flush' stage, rather than later in the 'setup'
      stage, since otherwise we might be flushing the _new_ personality state
      that we're just setting up.
      
      So this moves the flags and personality flushing (and 'flush_thread()',
      which is the arch-specific function that generally resets lazy FP state
      etc) of the old process into flush_old_exec(), so that it doesn't affect
      any state that execve() is setting up for the new process environment.
      
      This was reported by Michal Simek as breaking his Microblaze qemu
      environment.
      Reported-and-tested-by: NMichal Simek <michal.simek@petalogix.com>
      Cc: Peter Anvin <hpa@zytor.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7ab02af4
  6. 01 2月, 2010 3 次提交
  7. 31 1月, 2010 2 次提交
  8. 30 1月, 2010 1 次提交
    • L
      Split 'flush_old_exec' into two functions · 221af7f8
      Linus Torvalds 提交于
      'flush_old_exec()' is the point of no return when doing an execve(), and
      it is pretty badly misnamed.  It doesn't just flush the old executable
      environment, it also starts up the new one.
      
      Which is very inconvenient for things like setting up the new
      personality, because we want the new personality to affect the starting
      of the new environment, but at the same time we do _not_ want the new
      personality to take effect if flushing the old one fails.
      
      As a result, the x86-64 '32-bit' personality is actually done using this
      insane "I'm going to change the ABI, but I haven't done it yet" bit
      (TIF_ABI_PENDING), with SET_PERSONALITY() not actually setting the
      personality, but just the "pending" bit, so that "flush_thread()" can do
      the actual personality magic.
      
      This patch in no way changes any of that insanity, but it does split the
      'flush_old_exec()' function up into a preparatory part that can fail
      (still called flush_old_exec()), and a new part that will actually set
      up the new exec environment (setup_new_exec()).  All callers are changed
      to trivially comply with the new world order.
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      221af7f8
  9. 29 1月, 2010 8 次提交
  10. 28 1月, 2010 2 次提交
    • D
      block: fix bio_add_page for non trivial merge_bvec_fn case · 1d616585
      Dmitry Monakhov 提交于
      We have to properly decrease bi_size in order to merge_bvec_fn return
      right result.  Otherwise this result in false merge rejects for two
      absolutely valid bio_vecs.  This may cause significant performance
      penalty for example fs_block_size == 1k and block device is raid0 with
      small chunk_size = 8k. Then it is impossible to merge 7-th fs-block in
      to bio which already has 6 fs-blocks.
      
      Cc: <stable@kernel.org>
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      1d616585
    • F
      reiserfs: Fix vmalloc call under reiserfs lock · bbec9191
      Frederic Weisbecker 提交于
      Vmalloc is called to allocate journal->j_cnode_free_list but
      we hold the reiserfs lock at this time, which raises a
      {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} lock inversion.
      
      Just drop the reiserfs lock at this time, as it's not even
      needed but kept for paranoid reasons.
      
      This fixes:
      
      [ INFO: inconsistent lock state ]
      2.6.33-rc5 #1
      ---------------------------------
      inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
      kswapd0/313 [HC0[0]:SC0[0]:HE1:SE1] takes:
       (&REISERFS_SB(s)->lock){+.+.?.}, at: [<c11118c8>]
      reiserfs_write_lock_once+0x28/0x50
      {RECLAIM_FS-ON-W} state was registered at:
        [<c104ee32>] mark_held_locks+0x62/0x90
        [<c104eefa>] lockdep_trace_alloc+0x9a/0xc0
        [<c108f7b6>] kmem_cache_alloc+0x26/0xf0
        [<c108621c>] __get_vm_area_node+0x6c/0xf0
        [<c108690e>] __vmalloc_node+0x7e/0xa0
        [<c1086aab>] vmalloc+0x2b/0x30
        [<c110e1fb>] journal_init+0x6cb/0xa10
        [<c10f90a2>] reiserfs_fill_super+0x342/0xb80
        [<c1095665>] get_sb_bdev+0x145/0x180
        [<c10f68e1>] get_super_block+0x21/0x30
        [<c1094520>] vfs_kern_mount+0x40/0xd0
        [<c1094609>] do_kern_mount+0x39/0xd0
        [<c10aaa97>] do_mount+0x2c7/0x6d0
        [<c10aaf06>] sys_mount+0x66/0xa0
        [<c16198a7>] mount_block_root+0xc4/0x245
        [<c1619a81>] mount_root+0x59/0x5f
        [<c1619b98>] prepare_namespace+0x111/0x14b
        [<c1619269>] kernel_init+0xcf/0xdb
        [<c100303a>] kernel_thread_helper+0x6/0x1c
      irq event stamp: 63236801
      hardirqs last  enabled at (63236801): [<c134e7fa>]
      __mutex_unlock_slowpath+0x9a/0x120
      hardirqs last disabled at (63236800): [<c134e799>]
      __mutex_unlock_slowpath+0x39/0x120
      softirqs last  enabled at (63218800): [<c102f451>] __do_softirq+0xc1/0x110
      softirqs last disabled at (63218789): [<c102f4ed>] do_softirq+0x4d/0x60
      
      other info that might help us debug this:
      2 locks held by kswapd0/313:
       #0:  (shrinker_rwsem){++++..}, at: [<c1074bb4>] shrink_slab+0x24/0x170
       #1:  (&type->s_umount_key#19){++++..}, at: [<c10a2edd>]
      shrink_dcache_memory+0xfd/0x1a0
      
      stack backtrace:
      Pid: 313, comm: kswapd0 Not tainted 2.6.33-rc5 #1
      Call Trace:
       [<c134db2c>] ? printk+0x18/0x1c
       [<c104e7ef>] print_usage_bug+0x15f/0x1a0
       [<c104ebcf>] mark_lock+0x39f/0x5a0
       [<c104d66b>] ? trace_hardirqs_off+0xb/0x10
       [<c1052c50>] ? check_usage_forwards+0x0/0xf0
       [<c1050c24>] __lock_acquire+0x214/0xa70
       [<c10438c5>] ? sched_clock_cpu+0x95/0x110
       [<c10514fa>] lock_acquire+0x7a/0xa0
       [<c11118c8>] ? reiserfs_write_lock_once+0x28/0x50
       [<c134f03f>] mutex_lock_nested+0x5f/0x2b0
       [<c11118c8>] ? reiserfs_write_lock_once+0x28/0x50
       [<c11118c8>] ? reiserfs_write_lock_once+0x28/0x50
       [<c11118c8>] reiserfs_write_lock_once+0x28/0x50
       [<c10f05b0>] reiserfs_delete_inode+0x50/0x140
       [<c10a653f>] ? generic_delete_inode+0x5f/0x150
       [<c10f0560>] ? reiserfs_delete_inode+0x0/0x140
       [<c10a657c>] generic_delete_inode+0x9c/0x150
       [<c10a666d>] generic_drop_inode+0x3d/0x60
       [<c10a5597>] iput+0x47/0x50
       [<c10a2a4f>] dentry_iput+0x6f/0xf0
       [<c10a2af4>] d_kill+0x24/0x50
       [<c10a2d3d>] __shrink_dcache_sb+0x21d/0x2b0
       [<c10a2f0f>] shrink_dcache_memory+0x12f/0x1a0
       [<c1074c9e>] shrink_slab+0x10e/0x170
       [<c1075177>] kswapd+0x477/0x6a0
       [<c1072d10>] ? isolate_pages_global+0x0/0x1b0
       [<c103e160>] ? autoremove_wake_function+0x0/0x40
       [<c1074d00>] ? kswapd+0x0/0x6a0
       [<c103de6c>] kthread+0x6c/0x80
       [<c103de00>] ? kthread+0x0/0x80
       [<c100303a>] kernel_thread_helper+0x6/0x1c
      Reported-by: NAlexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Christian Kujau <lists@nerdbynature.de>
      Cc: Chris Mason <chris.mason@oracle.com>
      bbec9191
  11. 27 1月, 2010 2 次提交