1. 06 8月, 2006 2 次提交
    • C
      [PATCH] bug in futex unqueue_me · e91467ec
      Christian Borntraeger 提交于
      This patch adds a barrier() in futex unqueue_me to avoid aliasing of two
      pointers.
      
      On my s390x system I saw the following oops:
      
      Unable to handle kernel pointer dereference at virtual kernel address
      0000000000000000
      Oops: 0004 [#1]
      CPU:    0    Not tainted
      Process mytool (pid: 13613, task: 000000003ecb6ac0, ksp: 00000000366bdbd8)
      Krnl PSW : 0704d00180000000 00000000003c9ac2 (_spin_lock+0xe/0x30)
      Krnl GPRS: 00000000ffffffff 000000003ecb6ac0 0000000000000000 0700000000000000
                 0000000000000000 0000000000000000 000001fe00002028 00000000000c091f
                 000001fe00002054 000001fe00002054 0000000000000000 00000000366bddc0
                 00000000005ef8c0 00000000003d00e8 0000000000144f91 00000000366bdcb8
      Krnl Code: ba 4e 20 00 12 44 b9 16 00 3e a7 84 00 08 e3 e0 f0 88 00 04
      Call Trace:
      ([<0000000000144f90>] unqueue_me+0x40/0xe4)
       [<0000000000145a0c>] do_futex+0x33c/0xc40
       [<000000000014643e>] sys_futex+0x12e/0x144
       [<000000000010bb00>] sysc_noemu+0x10/0x16
       [<000002000003741c>] 0x2000003741c
      
      The code in question is:
      
      static int unqueue_me(struct futex_q *q)
      {
              int ret = 0;
              spinlock_t *lock_ptr;
      
              /* In the common case we don't take the spinlock, which is nice. */
       retry:
              lock_ptr = q->lock_ptr;
              if (lock_ptr != 0) {
                      spin_lock(lock_ptr);
      		/*
                       * q->lock_ptr can change between reading it and
                       * spin_lock(), causing us to take the wrong lock.  This
                       * corrects the race condition.
      [...]
      
      and my compiler (gcc 4.1.0) makes the following out of it:
      
      00000000000003c8 <unqueue_me>:
           3c8:       eb bf f0 70 00 24       stmg    %r11,%r15,112(%r15)
           3ce:       c0 d0 00 00 00 00       larl    %r13,3ce <unqueue_me+0x6>
                              3d0: R_390_PC32DBL      .rodata+0x2a
           3d4:       a7 f1 1e 00             tml     %r15,7680
           3d8:       a7 84 00 01             je      3da <unqueue_me+0x12>
           3dc:       b9 04 00 ef             lgr     %r14,%r15
           3e0:       a7 fb ff d0             aghi    %r15,-48
           3e4:       b9 04 00 b2             lgr     %r11,%r2
           3e8:       e3 e0 f0 98 00 24       stg     %r14,152(%r15)
           3ee:       e3 c0 b0 28 00 04       lg      %r12,40(%r11)
      		/* write q->lock_ptr in r12 */
           3f4:       b9 02 00 cc             ltgr    %r12,%r12
           3f8:       a7 84 00 4b             je      48e <unqueue_me+0xc6>
      		/* if r12 is zero then jump over the code.... */
           3fc:       e3 20 b0 28 00 04       lg      %r2,40(%r11)
      		/* write q->lock_ptr in r2 */
           402:       c0 e5 00 00 00 00       brasl   %r14,402 <unqueue_me+0x3a>
                              404: R_390_PC32DBL      _spin_lock+0x2
      		/* use r2 as parameter for spin_lock */
      
      So the code becomes more or less:
      if (q->lock_ptr != 0) spin_lock(q->lock_ptr)
      instead of
      if (lock_ptr != 0) spin_lock(lock_ptr)
      
      Which caused the oops from above.
      After adding a barrier gcc creates code without this problem:
      [...] (the same)
           3ee:       e3 c0 b0 28 00 04       lg      %r12,40(%r11)
           3f4:       b9 02 00 cc             ltgr    %r12,%r12
           3f8:       b9 04 00 2c             lgr     %r2,%r12
           3fc:       a7 84 00 48             je      48c <unqueue_me+0xc4>
           400:       c0 e5 00 00 00 00       brasl   %r14,400 <unqueue_me+0x38>
                              402: R_390_PC32DBL      _spin_lock+0x2
      
      As a general note, this code of unqueue_me seems a bit fishy. The retry logic
      of unqueue_me only works if we can guarantee, that the original value of
      q->lock_ptr is always a spinlock (Otherwise we overwrite kernel memory). We
      know that q->lock_ptr can change. I dont know what happens with the original
      spinlock, as I am not an expert with the futex code.
      
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Acked-by: NIngo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@timesys.com>
      Signed-off-by: NChristian Borntraeger <borntrae@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e91467ec
    • R
      [PATCH] Make suspend possible with a traced process at a breakpoint · a7ef7878
      Rafael J. Wysocki 提交于
      It should be possible to suspend, either to RAM or to disk, if there's a
      traced process that has just reached a breakpoint.  However, this is a
      special case, because its parent process might have been frozen already and
      then we are unable to deliver the "freeze" signal to the traced process.
      If this happens, it's better to cancel the freezing of the traced process.
      
      Ref. http://bugzilla.kernel.org/show_bug.cgi?id=6787Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a7ef7878
  2. 03 8月, 2006 9 次提交
  3. 01 8月, 2006 14 次提交
  4. 29 7月, 2006 2 次提交
  5. 24 7月, 2006 2 次提交
    • P
      [PATCH] Cpuset: fix ABBA deadlock with cpu hotplug lock · abb5a5cc
      Paul Jackson 提交于
      Fix ABBA deadlock between lock_cpu_hotplug() and the cpuset
      callback_mutex lock.
      
      It only happens on cpu_exclusive cpusets, due to the dynamic
      sched domain code trying to take the cpu hotplug lock inside
      the cpuset callback_mutex lock.
      
      This bug has apparently been here for several months, but didn't
      get hit until the right customer load on a large system.
      
      This fix appears right from inspection, but it will take a few
      more days running it on that customers workload to be confident
      we nailed it.  We don't have any other reproducible test case.
      
      The cpu_hotplug_lock() tends to cover large runs of code.
      The other places that hold both that lock and the cpuset callback
      mutex lock always nest the cpuset lock inside the hotplug lock.
      This place tries to do the reverse, risking an ABBA deadlock.
      
      This is in the cpuset_rmdir() code, where we:
        * take the callback_mutex lock
        * mark the cpuset CS_REMOVED
        * call update_cpu_domains for cpu_exclusive cpusets
        * in that call, take the cpu_hotplug lock if the
          cpuset is marked for removal.
      
      Thanks to Jack Steiner for identifying this deadlock.
      
      The fix is to tear down the dynamic sched domain before we grab
      the cpuset callback_mutex lock.  This way, the two locks are
      serialized, with the hotplug lock taken and released before
      trying for the cpuset lock.
      
      I suspect that this bug was introduced when I changed the
      cpuset locking from one lock to two.  The dynamic sched domain
      dependency on cpu_exclusive cpusets and its hotplug hooks were
      added to this code earlier, when cpusets had only a single lock.
      It may well have been fine then.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      abb5a5cc
    • L
      cpu hotplug: simplify and hopefully fix locking · aa953877
      Linus Torvalds 提交于
      The CPU hotplug locking was quite messy, with a recursive lock to
      handle the fact that both the actual up/down sequence wanted to
      protect itself from being re-entered, but the callbacks that it
      called also tended to want to protect themselves from CPU events.
      
      This splits the lock into two (one to serialize the whole hotplug
      sequence, the other to protect against the CPU present bitmaps
      changing). The latter still allows recursive usage because some
      subsystems (ondemand policy for cpufreq at least) had already gotten
      too used to the lax locking, but the locking mistakes are hopefully
      now less fundamental, and we now warn about recursive lock usage
      when we see it, in the hope that it can be fixed.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      aa953877
  6. 15 7月, 2006 11 次提交