1. 31 7月, 2008 2 次提交
  2. 26 7月, 2008 7 次提交
  3. 23 7月, 2008 1 次提交
  4. 18 7月, 2008 1 次提交
    • M
      cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2) · e761b772
      Max Krasnyansky 提交于
      This is based on Linus' idea of creating cpu_active_map that prevents
      scheduler load balancer from migrating tasks to the cpu that is going
      down.
      
      It allows us to simplify domain management code and avoid unecessary
      domain rebuilds during cpu hotplug event handling.
      
      Please ignore the cpusets part for now. It needs some more work in order
      to avoid crazy lock nesting. Although I did simplfy and unify domain
      reinitialization logic. We now simply call partition_sched_domains() in
      all the cases. This means that we're using exact same code paths as in
      cpusets case and hence the test below cover cpusets too.
      Cpuset changes to make rebuild_sched_domains() callable from various
      contexts are in the separate patch (right next after this one).
      
      This not only boots but also easily handles
      	while true; do make clean; make -j 8; done
      and
      	while true; do on-off-cpu 1; done
      at the same time.
      (on-off-cpu 1 simple does echo 0/1 > /sys/.../cpu1/online thing).
      
      Suprisingly the box (dual-core Core2) is quite usable. In fact I'm typing
      this on right now in gnome-terminal and things are moving just fine.
      
      Also this is running with most of the debug features enabled (lockdep,
      mutex, etc) no BUG_ONs or lockdep complaints so far.
      
      I believe I addressed all of the Dmitry's comments for original Linus'
      version. I changed both fair and rt balancer to mask out non-active cpus.
      And replaced cpu_is_offline() with !cpu_active() in the main scheduler
      code where it made sense (to me).
      Signed-off-by: NMax Krasnyanskiy <maxk@qualcomm.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NGregory Haskins <ghaskins@novell.com>
      Cc: dmitry.adamushko@gmail.com
      Cc: pj@sgi.com
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e761b772
  5. 13 7月, 2008 1 次提交
    • D
      cpusets, hotplug, scheduler: fix scheduler domain breakage · 3e84050c
      Dmitry Adamushko 提交于
      Commit f18f982a ("sched: CPU hotplug events must not destroy scheduler
      domains created by the cpusets") introduced a hotplug-related problem as
      described below:
      
      Upon CPU_DOWN_PREPARE,
      
        update_sched_domains() -> detach_destroy_domains(&cpu_online_map)
      
      does the following:
      
      /*
       * Force a reinitialization of the sched domains hierarchy. The domains
       * and groups cannot be updated in place without racing with the balancing
       * code, so we temporarily attach all running cpus to the NULL domain
       * which will prevent rebalancing while the sched domains are recalculated.
       */
      
      The sched-domains should be rebuilt when a CPU_DOWN ops. has been
      completed, effectively either upon CPU_DEAD{_FROZEN} (upon success) or
      CPU_DOWN_FAILED{_FROZEN} (upon failure -- restore the things to their
      initial state). That's what update_sched_domains() also does but only
      for !CPUSETS case.
      
      With f18f982a, sched-domains' reinitialization is delegated to
      CPUSETS code:
      
      cpuset_handle_cpuhp() -> common_cpu_mem_hotplug_unplug() ->
      rebuild_sched_domains()
      
      Being called for CPU_UP_PREPARE and if its callback is called after
      update_sched_domains()), it just negates all the work done by
      update_sched_domains() -- i.e. a soon-to-be-offline cpu is included in
      the sched-domains and that makes it visible for the load-balancer
      while the CPU_DOWN ops. is in progress.
      
      __migrate_live_tasks() moves the tasks off a 'dead' cpu (it's already
      "offline" when this function is called).
      
      try_to_wake_up() is called for one of these tasks from another CPU ->
      the load-balancer (wake_idle()) picks up a "dead" CPU and places the
      task on it. Then e.g. BUG_ON(rq->nr_running) detects this a bit later
      -> oops.
      Signed-off-by: NDmitry Adamushko <dmitry.adamushko@gmail.com>
      Tested-by: NVegard Nossum <vegard.nossum@gmail.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: miaox@cn.fujitsu.com
      Cc: rostedt@goodmis.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3e84050c
  6. 19 6月, 2008 2 次提交
  7. 10 6月, 2008 1 次提交
  8. 07 6月, 2008 1 次提交
  9. 06 6月, 2008 1 次提交
    • M
      sched: CPU hotplug events must not destroy scheduler domains created by the cpusets · 5c8e1ed1
      Max Krasnyansky 提交于
      First issue is not related to the cpusets. We're simply leaking doms_cur.
      It's allocated in arch_init_sched_domains() which is called for every
      hotplug event. So we just keep reallocation doms_cur without freeing it.
      I introduced free_sched_domains() function that cleans things up.
      
      Second issue is that sched domains created by the cpusets are
      completely destroyed by the CPU hotplug events. For all CPU hotplug
      events scheduler attaches all CPUs to the NULL domain and then puts
      them all into the single domain thereby destroying domains created
      by the cpusets (partition_sched_domains).
      The solution is simple, when cpusets are enabled scheduler should not
      create default domain and instead let cpusets do that. Which is
      exactly what the patch does.
      Signed-off-by: NMax Krasnyansky <maxk@qualcomm.com>
      Cc: pj@sgi.com
      Cc: menage@google.com
      Cc: rostedt@goodmis.org
      Cc: mingo@elte.hu
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      5c8e1ed1
  10. 09 5月, 2008 1 次提交
  11. 29 4月, 2008 5 次提交
  12. 28 4月, 2008 3 次提交
    • L
      mempolicy: rename mpol_copy to mpol_dup · 846a16bf
      Lee Schermerhorn 提交于
      This patch renames mpol_copy() to mpol_dup() because, well, that's what it
      does.  Like, e.g., strdup() for strings, mpol_dup() takes a pointer to an
      existing mempolicy, allocates a new one and copies the contents.
      
      In a later patch, I want to use the name mpol_copy() to copy the contents from
      one mempolicy to another like, e.g., strcpy() does for strings.
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      846a16bf
    • M
      mm: filter based on a nodemask as well as a gfp_mask · 19770b32
      Mel Gorman 提交于
      The MPOL_BIND policy creates a zonelist that is used for allocations
      controlled by that mempolicy.  As the per-node zonelist is already being
      filtered based on a zone id, this patch adds a version of __alloc_pages() that
      takes a nodemask for further filtering.  This eliminates the need for
      MPOL_BIND to create a custom zonelist.
      
      A positive benefit of this is that allocations using MPOL_BIND now use the
      local node's distance-ordered zonelist instead of a custom node-id-ordered
      zonelist.  I.e., pages will be allocated from the closest allowed node with
      available memory.
      
      [Lee.Schermerhorn@hp.com: Mempolicy: update stale documentation and comments]
      [Lee.Schermerhorn@hp.com: Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask]
      [Lee.Schermerhorn@hp.com: Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask rework]
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Acked-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      19770b32
    • M
      mm: have zonelist contains structs with both a zone pointer and zone_idx · dd1a239f
      Mel Gorman 提交于
      Filtering zonelists requires very frequent use of zone_idx().  This is costly
      as it involves a lookup of another structure and a substraction operation.  As
      the zone_idx is often required, it should be quickly accessible.  The node idx
      could also be stored here if it was found that accessing zone->node is
      significant which may be the case on workloads where nodemasks are heavily
      used.
      
      This patch introduces a struct zoneref to store a zone pointer and a zone
      index.  The zonelist then consists of an array of these struct zonerefs which
      are looked up as necessary.  Helpers are given for accessing the zone index as
      well as the node index.
      
      [kamezawa.hiroyu@jp.fujitsu.com: Suggested struct zoneref instead of embedding information in pointers]
      [hugh@veritas.com: mm-have-zonelist: fix memcg ooms]
      [hugh@veritas.com: just return do_try_to_free_pages]
      [hugh@veritas.com: do_try_to_free_pages gfp_mask redundant]
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Acked-by: NChristoph Lameter <clameter@sgi.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dd1a239f
  13. 20 4月, 2008 3 次提交
  14. 06 3月, 2008 1 次提交
  15. 09 2月, 2008 1 次提交
    • E
      proc: seqfile convert proc_pid_status to properly handle pid namespaces · df5f8314
      Eric W. Biederman 提交于
      Currently we possibly lookup the pid in the wrong pid namespace.  So
      seq_file convert proc_pid_status which ensures the proper pid namespaces is
      passed in.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: another build fix]
      [akpm@linux-foundation.org: s390 build fix]
      [akpm@linux-foundation.org: fix task_name() output]
      [akpm@linux-foundation.org: fix nommu build]
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Andrew Morgan <morgan@kernel.org>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      df5f8314
  16. 08 2月, 2008 5 次提交
  17. 26 1月, 2008 1 次提交
    • G
      cpu-hotplug: replace lock_cpu_hotplug() with get_online_cpus() · 86ef5c9a
      Gautham R Shenoy 提交于
      Replace all lock_cpu_hotplug/unlock_cpu_hotplug from the kernel and use
      get_online_cpus and put_online_cpus instead as it highlights the
      refcount semantics in these operations.
      
      The new API guarantees protection against the cpu-hotplug operation, but
      it doesn't guarantee serialized access to any of the local data
      structures. Hence the changes needs to be reviewed.
      
      In case of pseries_add_processor/pseries_remove_processor, use
      cpu_maps_update_begin()/cpu_maps_update_done() as we're modifying the
      cpu_present_map there.
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      86ef5c9a
  18. 20 10月, 2007 3 次提交
    • C
      hotplug cpu: migrate a task within its cpuset · 470fd646
      Cliff Wickman 提交于
      When a cpu is disabled, move_task_off_dead_cpu() is called for tasks that have
      been running on that cpu.
      
      Currently, such a task is migrated:
       1) to any cpu on the same node as the disabled cpu, which is both online
          and among that task's cpus_allowed
       2) to any cpu which is both online and among that task's cpus_allowed
      
      It is typical of a multithreaded application running on a large NUMA system to
      have its tasks confined to a cpuset so as to cluster them near the memory that
      they share.  Furthermore, it is typical to explicitly place such a task on a
      specific cpu in that cpuset.  And in that case the task's cpus_allowed
      includes only a single cpu.
      
      This patch would insert a preference to migrate such a task to some cpu within
      its cpuset (and set its cpus_allowed to its entire cpuset).
      
      With this patch, migrate the task to:
       1) to any cpu on the same node as the disabled cpu, which is both online
          and among that task's cpus_allowed
       2) to any online cpu within the task's cpuset
       3) to any cpu which is both online and among that task's cpus_allowed
      
      In order to do this, move_task_off_dead_cpu() must make a call to
      cpuset_cpus_allowed_locked(), a new subset of cpuset_cpus_allowed(), that will
      not block.  (name change - per Oleg's suggestion)
      
      Calls are made to cpuset_lock() and cpuset_unlock() in migration_call() to set
      the cpuset mutex during the whole migrate_live_tasks() and
      migrate_dead_tasks() procedure.
      
      [akpm@linux-foundation.org: build fix]
      [pj@sgi.com: Fix indentation and spacing]
      Signed-off-by: NCliff Wickman <cpw@sgi.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      470fd646
    • P
      Fix cpusets update_cpumask · 8707d8b8
      Paul Menage 提交于
      Cause writes to cpuset "cpus" file to update cpus_allowed for member tasks:
      
      - collect batches of tasks under tasklist_lock and then call
        set_cpus_allowed() on them outside the lock (since this can sleep).
      
      - add a simple generic priority heap type to allow efficient collection
        of batches of tasks to be processed without duplicating or missing any
        tasks in subsequent batches.
      
      - make "cpus" file update a no-op if the mask hasn't changed
      
      - fix race between update_cpumask() and sched_setaffinity() by making
        sched_setaffinity() post-check that it's not running on any cpus outside
        cpuset_cpus_allowed().
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NPaul Menage <menage@google.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8707d8b8
    • P
      cpusets: decrustify cpuset mask update code · 020958b6
      Paul Jackson 提交于
      Decrustify the kernel/cpuset.c 'cpus' and 'mems' updating code.
      
      Other than subtle improvements in the consistency of identifying
      white space at the beginning and end of passed in masks, this
      doesn't make any visible difference in behaviour.  But it's
      one or two hundred kernel text bytes smaller, and easier to
      understand.
      
      [akpm@linux-foundation.org: coding-style fix]
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Reviewed-by: NPaul Menage <menage@google.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      020958b6