1. 14 8月, 2012 3 次提交
    • M
      sched,cgroup: Fix up task_groups list · 35cf4e50
      Mike Galbraith 提交于
      With multiple instances of task_groups, for_each_rt_rq() is a noop,
      no task groups having been added to the rt.c list instance.  This
      renders __enable/disable_runtime() and print_rt_stats() noop, the
      user (non) visible effect being that rt task groups are missing in
      /proc/sched_debug.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Cc: stable@kernel.org # v3.3+
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1344308413.6846.7.camel@marge.simpson.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      35cf4e50
    • S
      sched: fix divide by zero at {thread_group,task}_times · bea6832c
      Stanislaw Gruszka 提交于
      On architectures where cputime_t is 64 bit type, is possible to trigger
      divide by zero on do_div(temp, (__force u32) total) line, if total is a
      non zero number but has lower 32 bit's zeroed. Removing casting is not
      a good solution since some do_div() implementations do cast to u32
      internally.
      
      This problem can be triggered in practice on very long lived processes:
      
        PID: 2331   TASK: ffff880472814b00  CPU: 2   COMMAND: "oraagent.bin"
         #0 [ffff880472a51b70] machine_kexec at ffffffff8103214b
         #1 [ffff880472a51bd0] crash_kexec at ffffffff810b91c2
         #2 [ffff880472a51ca0] oops_end at ffffffff814f0b00
         #3 [ffff880472a51cd0] die at ffffffff8100f26b
         #4 [ffff880472a51d00] do_trap at ffffffff814f03f4
         #5 [ffff880472a51d60] do_divide_error at ffffffff8100cfff
         #6 [ffff880472a51e00] divide_error at ffffffff8100be7b
            [exception RIP: thread_group_times+0x56]
            RIP: ffffffff81056a16  RSP: ffff880472a51eb8  RFLAGS: 00010046
            RAX: bc3572c9fe12d194  RBX: ffff880874150800  RCX: 0000000110266fad
            RDX: 0000000000000000  RSI: ffff880472a51eb8  RDI: 001038ae7d9633dc
            RBP: ffff880472a51ef8   R8: 00000000b10a3a64   R9: ffff880874150800
            R10: 00007fcba27ab680  R11: 0000000000000202  R12: ffff880472a51f08
            R13: ffff880472a51f10  R14: 0000000000000000  R15: 0000000000000007
            ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
         #7 [ffff880472a51f00] do_sys_times at ffffffff8108845d
         #8 [ffff880472a51f40] sys_times at ffffffff81088524
         #9 [ffff880472a51f80] system_call_fastpath at ffffffff8100b0f2
            RIP: 0000003808caac3a  RSP: 00007fcba27ab6d8  RFLAGS: 00000202
            RAX: 0000000000000064  RBX: ffffffff8100b0f2  RCX: 0000000000000000
            RDX: 00007fcba27ab6e0  RSI: 000000000076d58e  RDI: 00007fcba27ab6e0
            RBP: 00007fcba27ab700   R8: 0000000000000020   R9: 000000000000091b
            R10: 00007fcba27ab680  R11: 0000000000000202  R12: 00007fff9ca41940
            R13: 0000000000000000  R14: 00007fcba27ac9c0  R15: 00007fff9ca41940
            ORIG_RAX: 0000000000000064  CS: 0033  SS: 002b
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20120808092714.GA3580@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      bea6832c
    • P
      sched, cgroup: Reduce rq->lock hold times for large cgroup hierarchies · a35b6466
      Peter Zijlstra 提交于
      Peter Portante reported that for large cgroup hierarchies (and or on
      large CPU counts) we get immense lock contention on rq->lock and stuff
      stops working properly.
      
      His workload was a ton of processes, each in their own cgroup,
      everybody idling except for a sporadic wakeup once every so often.
      
      It was found that:
      
        schedule()
          idle_balance()
            load_balance()
              local_irq_save()
              double_rq_lock()
              update_h_load()
                walk_tg_tree(tg_load_down)
                  tg_load_down()
      
      Results in an entire cgroup hierarchy walk under rq->lock for every
      new-idle balance and since new-idle balance isn't throttled this
      results in a lot of work while holding the rq->lock.
      
      This patch does two things, it removes the work from under rq->lock
      based on the good principle of race and pray which is widely employed
      in the load-balancer as a whole. And secondly it throttles the
      update_h_load() calculation to max once per jiffy.
      
      I considered excluding update_h_load() for new-idle balance
      all-together, but purely relying on regular balance passes to update
      this data might not work out under some rare circumstances where the
      new-idle busiest isn't the regular busiest for a while (unlikely, but
      a nightmare to debug if someone hits it and suffers).
      
      Cc: pjt@google.com
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Reported-by: NPeter Portante <pportant@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/n/tip-aaarrzfpnaam7pqrekofu8a6@git.kernel.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      a35b6466
  2. 31 7月, 2012 1 次提交
  3. 26 7月, 2012 2 次提交
  4. 24 7月, 2012 9 次提交
  5. 22 7月, 2012 4 次提交
  6. 19 7月, 2012 1 次提交
    • L
      Make wait_for_device_probe() also do scsi_complete_async_scans() · eea03c20
      Linus Torvalds 提交于
      Commit a7a20d10 ("sd: limit the scope of the async probe domain")
      make the SCSI device probing run device discovery in it's own async
      domain.
      
      However, as a result, the partition detection was no longer synchronized
      by async_synchronize_full() (which, despite the name, only synchronizes
      the global async space, not all of them).  Which in turn meant that
      "wait_for_device_probe()" would not wait for the SCSI partitions to be
      parsed.
      
      And "wait_for_device_probe()" was what the boot time init code relied on
      for mounting the root filesystem.
      
      Now, most people never noticed this, because not only is it
      timing-dependent, but modern distributions all use initrd.  So the root
      filesystem isn't actually on a disk at all.  And then before they
      actually mount the final disk filesystem, they will have loaded the
      scsi-wait-scan module, which not only does the expected
      wait_for_device_probe(), but also does scsi_complete_async_scans().
      
      [ Side note: scsi_complete_async_scans() had also been partially broken,
        but that was fixed in commit 43a8d39d ("fix async probe
        regression"), so that same commit a7a20d10 had actually broken
        setups even if you used scsi-wait-scan explicitly ]
      
      Solve this problem by just moving the scsi_complete_async_scans() call
      into wait_for_device_probe().  Everybody who wants to wait for device
      probing to finish really wants the SCSI probing to complete, so there's
      no reason not to do this.
      
      So now "wait_for_device_probe()" really does what the name implies, and
      properly waits for device probing to finish.  This also removes the now
      unnecessary extra calls to scsi_complete_async_scans().
      Reported-and-tested-by: NArtem S. Tashkinov <t.artem@mailcity.com>
      Cc: Dan Williams <dan.j.williams@gmail.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: James Bottomley <jbottomley@parallels.com>
      Cc: Borislav Petkov <bp@amd64.org>
      Cc: linux-scsi <linux-scsi@vger.kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eea03c20
  7. 17 7月, 2012 1 次提交
  8. 15 7月, 2012 1 次提交
  9. 12 7月, 2012 7 次提交
  10. 10 7月, 2012 2 次提交
  11. 08 7月, 2012 2 次提交
    • T
      cgroup: fix cgroup hierarchy umount race · 5db9a4d9
      Tejun Heo 提交于
      48ddbe19 "cgroup: make css->refcnt clearing on cgroup removal
      optional" allowed a css to linger after the associated cgroup is
      removed.  As a css holds a reference on the cgroup's dentry, it means
      that cgroup dentries may linger for a while.
      
      Destroying a superblock which has dentries with positive refcnts is a
      critical bug and triggers BUG() in vfs code.  As each cgroup dentry
      holds an s_active reference, any lingering cgroup has both its dentry
      and the superblock pinned and thus preventing premature release of
      superblock.
      
      Unfortunately, after 48ddbe19, there's a small window while
      releasing a cgroup which is directly under the root of the hierarchy.
      When a cgroup directory is released, vfs layer first deletes the
      corresponding dentry and then invokes dput() on the parent, which may
      recurse further, so when a cgroup directly below root cgroup is
      released, the cgroup is first destroyed - which releases the s_active
      it was holding - and then the dentry for the root cgroup is dput().
      
      This creates a window where the root dentry's refcnt isn't zero but
      superblock's s_active is.  If umount happens before or during this
      window, vfs will see the root dentry with non-zero refcnt and trigger
      BUG().
      
      Before 48ddbe19, this problem didn't exist because the last dentry
      reference was guaranteed to be put synchronously from rmdir(2)
      invocation which holds s_active around the whole process.
      
      Fix it by holding an extra superblock->s_active reference across
      dput() from css release, which is the dput() path added by 48ddbe19
      and the only one which doesn't hold an extra s_active ref across the
      final cgroup dput().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      LKML-Reference: <4FEEA5CB.8070809@huawei.com>
      Reported-by: Nshyju pv <shyju.pv@huawei.com>
      Tested-by: Nshyju pv <shyju.pv@huawei.com>
      Cc: Sasha Levin <levinsasha928@gmail.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      5db9a4d9
    • T
      Revert "cgroup: superblock can't be released with active dentries" · 7db5b3ca
      Tejun Heo 提交于
      This reverts commit fa980ca8.  The
      commit was an attempt to fix a race condition where a cgroup hierarchy
      may be unmounted with positive dentry reference on root cgroup.  While
      the commit made the race condition slightly more difficult to trigger,
      the race was still there and could be reliably triggered using a
      different test case.
      
      Revert the incorrect fix.  The next commit will describe the race and
      fix it correctly.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      LKML-Reference: <4FEEA5CB.8070809@huawei.com>
      Reported-by: Nshyju pv <shyju.pv@huawei.com>
      Cc: Sasha Levin <levinsasha928@gmail.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      7db5b3ca
  12. 07 7月, 2012 5 次提交
  13. 06 7月, 2012 2 次提交