1. 04 12月, 2013 3 次提交
    • P
      rcu: Break call_rcu() deadlock involving scheduler and perf · 96d3fd0d
      Paul E. McKenney 提交于
      Dave Jones got the following lockdep splat:
      
      >  ======================================================
      >  [ INFO: possible circular locking dependency detected ]
      >  3.12.0-rc3+ #92 Not tainted
      >  -------------------------------------------------------
      >  trinity-child2/15191 is trying to acquire lock:
      >   (&rdp->nocb_wq){......}, at: [<ffffffff8108ff43>] __wake_up+0x23/0x50
      >
      > but task is already holding lock:
      >   (&ctx->lock){-.-...}, at: [<ffffffff81154c19>] perf_event_exit_task+0x109/0x230
      >
      > which lock already depends on the new lock.
      >
      >
      > the existing dependency chain (in reverse order) is:
      >
      > -> #3 (&ctx->lock){-.-...}:
      >         [<ffffffff810cc243>] lock_acquire+0x93/0x200
      >         [<ffffffff81733f90>] _raw_spin_lock+0x40/0x80
      >         [<ffffffff811500ff>] __perf_event_task_sched_out+0x2df/0x5e0
      >         [<ffffffff81091b83>] perf_event_task_sched_out+0x93/0xa0
      >         [<ffffffff81732052>] __schedule+0x1d2/0xa20
      >         [<ffffffff81732f30>] preempt_schedule_irq+0x50/0xb0
      >         [<ffffffff817352b6>] retint_kernel+0x26/0x30
      >         [<ffffffff813eed04>] tty_flip_buffer_push+0x34/0x50
      >         [<ffffffff813f0504>] pty_write+0x54/0x60
      >         [<ffffffff813e900d>] n_tty_write+0x32d/0x4e0
      >         [<ffffffff813e5838>] tty_write+0x158/0x2d0
      >         [<ffffffff811c4850>] vfs_write+0xc0/0x1f0
      >         [<ffffffff811c52cc>] SyS_write+0x4c/0xa0
      >         [<ffffffff8173d4e4>] tracesys+0xdd/0xe2
      >
      > -> #2 (&rq->lock){-.-.-.}:
      >         [<ffffffff810cc243>] lock_acquire+0x93/0x200
      >         [<ffffffff81733f90>] _raw_spin_lock+0x40/0x80
      >         [<ffffffff810980b2>] wake_up_new_task+0xc2/0x2e0
      >         [<ffffffff81054336>] do_fork+0x126/0x460
      >         [<ffffffff81054696>] kernel_thread+0x26/0x30
      >         [<ffffffff8171ff93>] rest_init+0x23/0x140
      >         [<ffffffff81ee1e4b>] start_kernel+0x3f6/0x403
      >         [<ffffffff81ee1571>] x86_64_start_reservations+0x2a/0x2c
      >         [<ffffffff81ee1664>] x86_64_start_kernel+0xf1/0xf4
      >
      > -> #1 (&p->pi_lock){-.-.-.}:
      >         [<ffffffff810cc243>] lock_acquire+0x93/0x200
      >         [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90
      >         [<ffffffff810979d1>] try_to_wake_up+0x31/0x350
      >         [<ffffffff81097d62>] default_wake_function+0x12/0x20
      >         [<ffffffff81084af8>] autoremove_wake_function+0x18/0x40
      >         [<ffffffff8108ea38>] __wake_up_common+0x58/0x90
      >         [<ffffffff8108ff59>] __wake_up+0x39/0x50
      >         [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0
      >         [<ffffffff81111450>] __call_rcu+0x140/0x820
      >         [<ffffffff81111b8d>] call_rcu+0x1d/0x20
      >         [<ffffffff81093697>] cpu_attach_domain+0x287/0x360
      >         [<ffffffff81099d7e>] build_sched_domains+0xe5e/0x10a0
      >         [<ffffffff81efa7fc>] sched_init_smp+0x3b7/0x47a
      >         [<ffffffff81ee1f4e>] kernel_init_freeable+0xf6/0x202
      >         [<ffffffff817200be>] kernel_init+0xe/0x190
      >         [<ffffffff8173d22c>] ret_from_fork+0x7c/0xb0
      >
      > -> #0 (&rdp->nocb_wq){......}:
      >         [<ffffffff810cb7ca>] __lock_acquire+0x191a/0x1be0
      >         [<ffffffff810cc243>] lock_acquire+0x93/0x200
      >         [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90
      >         [<ffffffff8108ff43>] __wake_up+0x23/0x50
      >         [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0
      >         [<ffffffff81111450>] __call_rcu+0x140/0x820
      >         [<ffffffff81111bb0>] kfree_call_rcu+0x20/0x30
      >         [<ffffffff81149abf>] put_ctx+0x4f/0x70
      >         [<ffffffff81154c3e>] perf_event_exit_task+0x12e/0x230
      >         [<ffffffff81056b8d>] do_exit+0x30d/0xcc0
      >         [<ffffffff8105893c>] do_group_exit+0x4c/0xc0
      >         [<ffffffff810589c4>] SyS_exit_group+0x14/0x20
      >         [<ffffffff8173d4e4>] tracesys+0xdd/0xe2
      >
      > other info that might help us debug this:
      >
      > Chain exists of:
      >   &rdp->nocb_wq --> &rq->lock --> &ctx->lock
      >
      >   Possible unsafe locking scenario:
      >
      >         CPU0                    CPU1
      >         ----                    ----
      >    lock(&ctx->lock);
      >                                 lock(&rq->lock);
      >                                 lock(&ctx->lock);
      >    lock(&rdp->nocb_wq);
      >
      >  *** DEADLOCK ***
      >
      > 1 lock held by trinity-child2/15191:
      >  #0:  (&ctx->lock){-.-...}, at: [<ffffffff81154c19>] perf_event_exit_task+0x109/0x230
      >
      > stack backtrace:
      > CPU: 2 PID: 15191 Comm: trinity-child2 Not tainted 3.12.0-rc3+ #92
      >  ffffffff82565b70 ffff880070c2dbf8 ffffffff8172a363 ffffffff824edf40
      >  ffff880070c2dc38 ffffffff81726741 ffff880070c2dc90 ffff88022383b1c0
      >  ffff88022383aac0 0000000000000000 ffff88022383b188 ffff88022383b1c0
      > Call Trace:
      >  [<ffffffff8172a363>] dump_stack+0x4e/0x82
      >  [<ffffffff81726741>] print_circular_bug+0x200/0x20f
      >  [<ffffffff810cb7ca>] __lock_acquire+0x191a/0x1be0
      >  [<ffffffff810c6439>] ? get_lock_stats+0x19/0x60
      >  [<ffffffff8100b2f4>] ? native_sched_clock+0x24/0x80
      >  [<ffffffff810cc243>] lock_acquire+0x93/0x200
      >  [<ffffffff8108ff43>] ? __wake_up+0x23/0x50
      >  [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90
      >  [<ffffffff8108ff43>] ? __wake_up+0x23/0x50
      >  [<ffffffff8108ff43>] __wake_up+0x23/0x50
      >  [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0
      >  [<ffffffff81111450>] __call_rcu+0x140/0x820
      >  [<ffffffff8109bc8f>] ? local_clock+0x3f/0x50
      >  [<ffffffff81111bb0>] kfree_call_rcu+0x20/0x30
      >  [<ffffffff81149abf>] put_ctx+0x4f/0x70
      >  [<ffffffff81154c3e>] perf_event_exit_task+0x12e/0x230
      >  [<ffffffff81056b8d>] do_exit+0x30d/0xcc0
      >  [<ffffffff810c9af5>] ? trace_hardirqs_on_caller+0x115/0x1e0
      >  [<ffffffff810c9bcd>] ? trace_hardirqs_on+0xd/0x10
      >  [<ffffffff8105893c>] do_group_exit+0x4c/0xc0
      >  [<ffffffff810589c4>] SyS_exit_group+0x14/0x20
      >  [<ffffffff8173d4e4>] tracesys+0xdd/0xe2
      
      The underlying problem is that perf is invoking call_rcu() with the
      scheduler locks held, but in NOCB mode, call_rcu() will with high
      probability invoke the scheduler -- which just might want to use its
      locks.  The reason that call_rcu() needs to invoke the scheduler is
      to wake up the corresponding rcuo callback-offload kthread, which
      does the job of starting up a grace period and invoking the callbacks
      afterwards.
      
      One solution (championed on a related problem by Lai Jiangshan) is to
      simply defer the wakeup to some point where scheduler locks are no longer
      held.  Since we don't want to unnecessarily incur the cost of such
      deferral, the task before us is threefold:
      
      1.	Determine when it is likely that a relevant scheduler lock is held.
      
      2.	Defer the wakeup in such cases.
      
      3.	Ensure that all deferred wakeups eventually happen, preferably
      	sooner rather than later.
      
      We use irqs_disabled_flags() as a proxy for relevant scheduler locks
      being held.  This works because the relevant locks are always acquired
      with interrupts disabled.  We may defer more often than needed, but that
      is at least safe.
      
      The wakeup deferral is tracked via a new field in the per-CPU and
      per-RCU-flavor rcu_data structure, namely ->nocb_defer_wakeup.
      
      This flag is checked by the RCU core processing.  The __rcu_pending()
      function now checks this flag, which causes rcu_check_callbacks()
      to initiate RCU core processing at each scheduling-clock interrupt
      where this flag is set.  Of course this is not sufficient because
      scheduling-clock interrupts are often turned off (the things we used to
      be able to count on!).  So the flags are also checked on entry to any
      state that RCU considers to be idle, which includes both NO_HZ_IDLE idle
      state and NO_HZ_FULL user-mode-execution state.
      
      This approach should allow call_rcu() to be invoked regardless of what
      locks you might be holding, the key word being "should".
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      96d3fd0d
    • P
      rcu: Fix and comment ordering around wait_event() · 78e4bc34
      Paul E. McKenney 提交于
      It is all too easy to forget that wait_event() does not necessarily
      imply a full memory barrier.  The case where it does not is where the
      condition transitions to true just as wait_event() starts execution.
      This is actually a feature: The standard use of wait_event() involves
      locking, in which case the locks provide the needed ordering (you hold a
      lock across the wake_up() and acquire that same lock after wait_event()
      returns).
      
      Given that I did forget that wait_event() does not necessarily imply a
      full memory barrier in one case, this commit fixes that case.  This commit
      also adds comments calling out the placement of existing memory barriers
      relied on by wait_event() calls.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      78e4bc34
    • P
      rcu: Kick CPU halfway to RCU CPU stall warning · 6193c76a
      Paul E. McKenney 提交于
      When an RCU CPU stall warning occurs, the CPU invokes resched_cpu() on
      itself.  This can help move the grace period forward in some situations,
      but it would be even better to do this -before- the RCU CPU stall warning.
      This commit therefore causes resched_cpu() to be called every five jiffies
      once the system is halfway to an RCU CPU stall warning.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      6193c76a
  2. 30 11月, 2013 10 次提交
    • L
      Linux 3.13-rc2 · dc1ccc48
      Linus Torvalds 提交于
      dc1ccc48
    • L
      Merge tag 'arm64-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64 · d5ff835f
      Linus Torvalds 提交于
      Pull ARM64 fixes from Catalin Marinas:
       - Remove preempt_count modifications in the arm64 IRQ handling code
         since that's already dealt with in generic irq_enter/irq_exit
       - PTE_PROT_NONE bit moved higher up to avoid overlapping with the
         hardware bits (for PROT_NONE mappings which are pte_present)
       - Big-endian fixes for ptrace support
       - Asynchronous aborts unmasking while in the kernel
       - pgprot_writecombine() change to create Normal NonCacheable memory
         rather than Device GRE
      
      * tag 'arm64-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64:
        arm64: Move PTE_PROT_NONE higher up
        arm64: Use Normal NonCacheable memory for writecombine
        arm64: debug: make aarch32 bkpt checking endian clean
        arm64: ptrace: fix compat registes get/set to be endian clean
        arm64: Unmask asynchronous aborts when in kernel mode
        arm64: dts: Reserve the memory used for secondary CPU release address
        arm64: let the core code deal with preempt_count
      d5ff835f
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 033dbbde
      Linus Torvalds 提交于
      Pull s390 updates from Martin Schwidefsky:
       "One performance improvement and a few bug fixes.  Two of the fixes
        deal with the clock related problems we have seen on recent kernels"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/mm: handle asce-type exceptions as normal page fault
        s390,time: revert direct ktime path for s390 clockevent device
        s390/time,vdso: convert to the new update_vsyscall interface
        s390/uaccess: add missing page table walk range check
        s390/mm: optimize copy_page
        s390/dasd: validate request size before building CCW/TCW request
        s390/signal: always restore saved runtime instrumentation psw bit
      033dbbde
    • L
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · dc418f6e
      Linus Torvalds 提交于
      Pull i2c fixes from Wolfram Sang:
       "Some easy but needed fixes for i2c drivers since rc1"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: bcm2835: Linking platform nodes to adapter nodes
        i2c: omap: raw read and write endian fix
        i2c: i2c-bcm-kona: Fix module build
        i2c: i2c-diolan-u2c: different usb endpoints for DLN-2-U2C
        i2c: bcm-kona: remove duplicated include
        i2c: davinci: raw read and write endian fix
      dc418f6e
    • L
      Merge branch 'for-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq · 7224b31b
      Linus Torvalds 提交于
      Pull workqueue fixes from Tejun Heo:
       "This contains one important fix.  The NUMA support added a while back
        broke ordering guarantees on ordered workqueues.  It was enforced by
        having single frontend interface with @max_active == 1 but the NUMA
        support puts multiple interfaces on unbound workqueues on NUMA
        machines thus breaking the ordered guarantee.  This is fixed by
        disabling NUMA support on ordered workqueues.
      
        The above and a couple other patches were sitting in for-3.12-fixes
        but I forgot to push that out, so they ended up waiting a bit too
        long.  My aplogies.
      
        Other fixes are minor"
      
      * 'for-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
        workqueue: fix pool ID allocation leakage and remove BUILD_BUG_ON() in init_workqueues
        workqueue: fix comment typo for __queue_work()
        workqueue: fix ordered workqueues in NUMA setups
        workqueue: swap set_cpus_allowed_ptr() and PF_NO_SETAFFINITY
      7224b31b
    • L
      Merge branch 'for-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata · de92a058
      Linus Torvalds 提交于
      Pull libata fixes from Tejun Heo:
       "libata device removal path was removing parent device node before its
        child, which is mostly harmless but triggers warning after recent
        sysfs changes.  Rafael's patch fixes the order.
      
        Other than that, minor controller-specific fixes and device ID
        additions"
      
      * 'for-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
        ATA: Fix port removal ordering
        ahci: add Marvell 9230 to the AHCI PCI device list
        ata: fix acpi_bus_get_device() return value check
        pata_arasan_cf: add missing clk_disable_unprepare() on error path
        ahci: add support for IBM Akebono platform device
      de92a058
    • L
      Merge branch 'for-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · 2855987d
      Linus Torvalds 提交于
      Pull cgroup fixes from Tejun Heo:
       "Fixes for three issues.
      
         - cgroup destruction path could swamp system_wq possibly leading to
           deadlock.  This actually seems to happen in the wild with memcg
           because memcg destruction path adds nested dependency on system_wq.
      
           Resolved by isolating cgroup destruction work items on its
           dedicated workqueue.
      
         - Possible locking context deadlock through seqcount reported by
           lockdep
      
         - Memory leak under certain conditions"
      
      * 'for-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
        cgroup: fix cgroup_subsys_state leak for seq_files
        cpuset: Fix memory allocator deadlock
        cgroup: use a dedicated workqueue for cgroup destruction
      2855987d
    • L
      Merge tag 'sound-3.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · b8495995
      Linus Torvalds 提交于
      Pull sound fixes from Takashi Iwai:
       "Quite a few HD-Audio fixes, a WUSB audio fix and a fix for FireWire
        audio.  The HD-audio part contains a couple of fixes for the generic
        parser, and these are the only intrusive fixes.  The rest are mostly
        device-specific fixes"
      
      * tag 'sound-3.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: hda - Add LFE chmap to ASUS ET2700
        ALSA: hda - Initialize missing bass speaker pin for ASUS AIO ET2700
        ALSA: hda - limit mic boost on Asus UX31[A,E]
        ALSA: hda - Check leaf nodes to find aamix amps
        ALSA: hda - Fix hp-mic mode without VREF bits
        ALSA: hda - Create Headhpone Mic Jack Mode when really needed
        ALSA: usb: use multiple packets per urb for Wireless USB inbound audio
        ALSA: hda - Enable mute/mic-mute LEDs for more Thinkpads with Conexant codec
        ALSA: hda - Drop bus->avoid_link_reset flag
        ALSA: hda/realtek - Set pcbeep amp for ALC668
        ALSA: hda/realtek - Add support of ALC231 codec
        ALSA: firewire-lib: fix wrong value for FDF field as an empty packet
      b8495995
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · b01537bf
      Linus Torvalds 提交于
      Pull vfs dentry reference count fix from Al Viro.
      
      This fixes a possible inode_permission NULL pointer dereference (and
      other problems) that were due to the root dentry count being decremented
      too much.  In commit 48a066e7 ("RCU'd vfsmounts") the placement of
      clearing the LOOKUP_RCU bit changed, and we then returned failure of
      incrementing the lockref on the parent dentry with LOOKUP_RCU cleared.
      
      But that meant we needed to go through the same cleanup routines that
      the later failures did wrt LOOKUP_ROOT and nd->root.
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        fix bogus path_put() of nd->root after some unlazy_walk() failures
      b01537bf
    • L
      Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · 282c183b
      Linus Torvalds 提交于
      Pull drm qxl leak fix from Dave Airlie:
       "As usual 5 mins after I send a trivial pull fix I find a real bug!
      
        This fixes a memory leak and I'd like to get it into stable queue
        asap"
      
      * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
        drm/qxl: fix memory leak in release list handling
      282c183b
  3. 29 11月, 2013 10 次提交
  4. 28 11月, 2013 17 次提交