1. 20 10月, 2008 2 次提交
  2. 03 10月, 2008 1 次提交
  3. 14 9月, 2008 1 次提交
  4. 14 8月, 2008 1 次提交
    • M
      sched, cpuset: rework sched domains and CPU hotplug handling (v4) · cf417141
      Max Krasnyansky 提交于
      This is an updated version of my previous cpuset patch on top of
      the latest mainline git.
      The patch fixes CPU hotplug handling issues in the current cpusets code.
      Namely circular locking in rebuild_sched_domains() and unsafe access to
      the cpu_online_map in the cpuset cpu hotplug handler.
      
      This version includes changes suggested by Paul Jackson (naming, comments,
      style, etc). I also got rid of the separate workqueue thread because it is
      now safe to call get_online_cpus() from workqueue callbacks.
      
      Here are some more details:
      
      rebuild_sched_domains() is the only way to rebuild sched domains
      correctly based on the current cpuset settings. What this means
      is that we need to be able to call it from different contexts,
      like cpu hotplug for example.
      Also latest scheduler code in -tip now calls rebuild_sched_domains()
      directly from functions like arch_reinit_sched_domains().
      
      In order to support that properly we need to rework cpuset locking
      rules to avoid circular dependencies, which is what this patch does.
      New lock nesting rules are explained in the comments.
      We can now safely call rebuild_sched_domains() from virtually any
      context. The only requirement is that it needs to be called under
      get_online_cpus(). This allows cpu hotplug handlers and the scheduler
      to call rebuild_sched_domains() directly.
      The rest of the cpuset code now offloads sched domains rebuilds to
      a workqueue (async_rebuild_sched_domains()).
      
      This version of the patch addresses comments from the previous review.
      I fixed all miss-formated comments and trailing spaces.
      
      I also factored out the code that builds domain masks and split up CPU and
      memory hotplug handling. This was needed to simplify locking, to avoid unsafe
      access to the cpu_online_map from mem hotplug handler, and in general to make
      things cleaner.
      
      The patch passes moderate testing (building kernel with -j 16, creating &
      removing domains and bringing cpus off/online at the same time) on the
      quad-core2 based machine.
      
      It passes lockdep checks, even with preemptable RCU enabled.
      This time I also tested in with suspend/resume path and everything is working
      as expected.
      Signed-off-by: NMax Krasnyansky <maxk@qualcomm.com>
      Acked-by: NPaul Jackson <pj@sgi.com>
      Cc: menage@google.com
      Cc: a.p.zijlstra@chello.nl
      Cc: vegard.nossum@gmail.com
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cf417141
  5. 31 7月, 2008 4 次提交
  6. 26 7月, 2008 7 次提交
  7. 23 7月, 2008 1 次提交
  8. 18 7月, 2008 1 次提交
    • M
      cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2) · e761b772
      Max Krasnyansky 提交于
      This is based on Linus' idea of creating cpu_active_map that prevents
      scheduler load balancer from migrating tasks to the cpu that is going
      down.
      
      It allows us to simplify domain management code and avoid unecessary
      domain rebuilds during cpu hotplug event handling.
      
      Please ignore the cpusets part for now. It needs some more work in order
      to avoid crazy lock nesting. Although I did simplfy and unify domain
      reinitialization logic. We now simply call partition_sched_domains() in
      all the cases. This means that we're using exact same code paths as in
      cpusets case and hence the test below cover cpusets too.
      Cpuset changes to make rebuild_sched_domains() callable from various
      contexts are in the separate patch (right next after this one).
      
      This not only boots but also easily handles
      	while true; do make clean; make -j 8; done
      and
      	while true; do on-off-cpu 1; done
      at the same time.
      (on-off-cpu 1 simple does echo 0/1 > /sys/.../cpu1/online thing).
      
      Suprisingly the box (dual-core Core2) is quite usable. In fact I'm typing
      this on right now in gnome-terminal and things are moving just fine.
      
      Also this is running with most of the debug features enabled (lockdep,
      mutex, etc) no BUG_ONs or lockdep complaints so far.
      
      I believe I addressed all of the Dmitry's comments for original Linus'
      version. I changed both fair and rt balancer to mask out non-active cpus.
      And replaced cpu_is_offline() with !cpu_active() in the main scheduler
      code where it made sense (to me).
      Signed-off-by: NMax Krasnyanskiy <maxk@qualcomm.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NGregory Haskins <ghaskins@novell.com>
      Cc: dmitry.adamushko@gmail.com
      Cc: pj@sgi.com
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e761b772
  9. 13 7月, 2008 1 次提交
    • D
      cpusets, hotplug, scheduler: fix scheduler domain breakage · 3e84050c
      Dmitry Adamushko 提交于
      Commit f18f982a ("sched: CPU hotplug events must not destroy scheduler
      domains created by the cpusets") introduced a hotplug-related problem as
      described below:
      
      Upon CPU_DOWN_PREPARE,
      
        update_sched_domains() -> detach_destroy_domains(&cpu_online_map)
      
      does the following:
      
      /*
       * Force a reinitialization of the sched domains hierarchy. The domains
       * and groups cannot be updated in place without racing with the balancing
       * code, so we temporarily attach all running cpus to the NULL domain
       * which will prevent rebalancing while the sched domains are recalculated.
       */
      
      The sched-domains should be rebuilt when a CPU_DOWN ops. has been
      completed, effectively either upon CPU_DEAD{_FROZEN} (upon success) or
      CPU_DOWN_FAILED{_FROZEN} (upon failure -- restore the things to their
      initial state). That's what update_sched_domains() also does but only
      for !CPUSETS case.
      
      With f18f982a, sched-domains' reinitialization is delegated to
      CPUSETS code:
      
      cpuset_handle_cpuhp() -> common_cpu_mem_hotplug_unplug() ->
      rebuild_sched_domains()
      
      Being called for CPU_UP_PREPARE and if its callback is called after
      update_sched_domains()), it just negates all the work done by
      update_sched_domains() -- i.e. a soon-to-be-offline cpu is included in
      the sched-domains and that makes it visible for the load-balancer
      while the CPU_DOWN ops. is in progress.
      
      __migrate_live_tasks() moves the tasks off a 'dead' cpu (it's already
      "offline" when this function is called).
      
      try_to_wake_up() is called for one of these tasks from another CPU ->
      the load-balancer (wake_idle()) picks up a "dead" CPU and places the
      task on it. Then e.g. BUG_ON(rq->nr_running) detects this a bit later
      -> oops.
      Signed-off-by: NDmitry Adamushko <dmitry.adamushko@gmail.com>
      Tested-by: NVegard Nossum <vegard.nossum@gmail.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: miaox@cn.fujitsu.com
      Cc: rostedt@goodmis.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3e84050c
  10. 19 6月, 2008 2 次提交
  11. 10 6月, 2008 1 次提交
  12. 07 6月, 2008 1 次提交
  13. 06 6月, 2008 1 次提交
    • M
      sched: CPU hotplug events must not destroy scheduler domains created by the cpusets · 5c8e1ed1
      Max Krasnyansky 提交于
      First issue is not related to the cpusets. We're simply leaking doms_cur.
      It's allocated in arch_init_sched_domains() which is called for every
      hotplug event. So we just keep reallocation doms_cur without freeing it.
      I introduced free_sched_domains() function that cleans things up.
      
      Second issue is that sched domains created by the cpusets are
      completely destroyed by the CPU hotplug events. For all CPU hotplug
      events scheduler attaches all CPUs to the NULL domain and then puts
      them all into the single domain thereby destroying domains created
      by the cpusets (partition_sched_domains).
      The solution is simple, when cpusets are enabled scheduler should not
      create default domain and instead let cpusets do that. Which is
      exactly what the patch does.
      Signed-off-by: NMax Krasnyansky <maxk@qualcomm.com>
      Cc: pj@sgi.com
      Cc: menage@google.com
      Cc: rostedt@goodmis.org
      Cc: mingo@elte.hu
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      5c8e1ed1
  14. 09 5月, 2008 1 次提交
  15. 29 4月, 2008 5 次提交
  16. 28 4月, 2008 3 次提交
    • L
      mempolicy: rename mpol_copy to mpol_dup · 846a16bf
      Lee Schermerhorn 提交于
      This patch renames mpol_copy() to mpol_dup() because, well, that's what it
      does.  Like, e.g., strdup() for strings, mpol_dup() takes a pointer to an
      existing mempolicy, allocates a new one and copies the contents.
      
      In a later patch, I want to use the name mpol_copy() to copy the contents from
      one mempolicy to another like, e.g., strcpy() does for strings.
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      846a16bf
    • M
      mm: filter based on a nodemask as well as a gfp_mask · 19770b32
      Mel Gorman 提交于
      The MPOL_BIND policy creates a zonelist that is used for allocations
      controlled by that mempolicy.  As the per-node zonelist is already being
      filtered based on a zone id, this patch adds a version of __alloc_pages() that
      takes a nodemask for further filtering.  This eliminates the need for
      MPOL_BIND to create a custom zonelist.
      
      A positive benefit of this is that allocations using MPOL_BIND now use the
      local node's distance-ordered zonelist instead of a custom node-id-ordered
      zonelist.  I.e., pages will be allocated from the closest allowed node with
      available memory.
      
      [Lee.Schermerhorn@hp.com: Mempolicy: update stale documentation and comments]
      [Lee.Schermerhorn@hp.com: Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask]
      [Lee.Schermerhorn@hp.com: Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask rework]
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Acked-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      19770b32
    • M
      mm: have zonelist contains structs with both a zone pointer and zone_idx · dd1a239f
      Mel Gorman 提交于
      Filtering zonelists requires very frequent use of zone_idx().  This is costly
      as it involves a lookup of another structure and a substraction operation.  As
      the zone_idx is often required, it should be quickly accessible.  The node idx
      could also be stored here if it was found that accessing zone->node is
      significant which may be the case on workloads where nodemasks are heavily
      used.
      
      This patch introduces a struct zoneref to store a zone pointer and a zone
      index.  The zonelist then consists of an array of these struct zonerefs which
      are looked up as necessary.  Helpers are given for accessing the zone index as
      well as the node index.
      
      [kamezawa.hiroyu@jp.fujitsu.com: Suggested struct zoneref instead of embedding information in pointers]
      [hugh@veritas.com: mm-have-zonelist: fix memcg ooms]
      [hugh@veritas.com: just return do_try_to_free_pages]
      [hugh@veritas.com: do_try_to_free_pages gfp_mask redundant]
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Acked-by: NChristoph Lameter <clameter@sgi.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dd1a239f
  17. 20 4月, 2008 3 次提交
  18. 06 3月, 2008 1 次提交
  19. 09 2月, 2008 1 次提交
    • E
      proc: seqfile convert proc_pid_status to properly handle pid namespaces · df5f8314
      Eric W. Biederman 提交于
      Currently we possibly lookup the pid in the wrong pid namespace.  So
      seq_file convert proc_pid_status which ensures the proper pid namespaces is
      passed in.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: another build fix]
      [akpm@linux-foundation.org: s390 build fix]
      [akpm@linux-foundation.org: fix task_name() output]
      [akpm@linux-foundation.org: fix nommu build]
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Andrew Morgan <morgan@kernel.org>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      df5f8314
  20. 08 2月, 2008 2 次提交
    • P
      hotplug cpu move tasks in empty cpusets - refinements · b4501295
      Paul Jackson 提交于
      - Narrow the scope of callback_mutex in scan_for_empty_cpusets().
      
      - Avoid rewriting the cpus, mems of cpusets except when it is likely that
        we'll be changing them.
      
      - Have remove_tasks_in_empty_cpuset() also check for empty mems.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Acked-by: NCliff Wickman <cpw@sgi.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b4501295
    • P
      hotplug cpu: move tasks in empty cpusets to parent various other fixes · c8d9c90c
      Paul Jackson 提交于
      Various minor formatting and comment tweaks to Cliff Wickman's
      [PATCH_3_of_3]_cpusets__update_cpumask_revision.patch
      
      I had had "iff", meaning "if and only if" in a comment.  However, except for
      ancient mathematicians, the abbreviation "iff" was a tad too cryptic.  Cliff
      changed it to "if", presumably figuring that the "iff" was a typo.  However,
      it was the "only if" half of the conjunction that was most interesting.
      Reword to emphasis the "only if" aspect.
      
      The locking comment for remove_tasks_in_empty_cpuset() was wrong; it said
      callback_mutex had to be held on entry.  The opposite is true.
      
      Several mentions of attach_task() in comments needed to be
      changed to cgroup_attach_task().
      
      A comment about notify_on_release was no longer relevant,
      as the line of code it had commented, namely:
      	set_bit(CS_RELEASED_RESOURCE, &parent->flags);
      is no longer present in that place in the cpuset.c code.
      
      Similarly a comment about notify_on_release before the
      scan_for_empty_cpusets() routine was no longer relevant.
      
      Removed extra parentheses and unnecessary return statement.
      
      Renamed attach_task() to cpuset_attach() in various comments.
      
      Removed comment about not needing memory migration, as it seems the migration
      is done anyway, via the cpuset_attach() callback from cgroup_attach_task().
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Acked-by: NCliff Wickman <cpw@sgi.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c8d9c90c