1. 07 12月, 2009 2 次提交
    • P
      sched: Fix balance vs hotplug race · 6ad4c188
      Peter Zijlstra 提交于
      Since (e761b772: cpu hotplug, sched: Introduce cpu_active_map and redo
      sched domain managment) we have cpu_active_mask which is suppose to rule
      scheduler migration and load-balancing, except it never (fully) did.
      
      The particular problem being solved here is a crash in try_to_wake_up()
      where select_task_rq() ends up selecting an offline cpu because
      select_task_rq_fair() trusts the sched_domain tree to reflect the
      current state of affairs, similarly select_task_rq_rt() trusts the
      root_domain.
      
      However, the sched_domains are updated from CPU_DEAD, which is after the
      cpu is taken offline and after stop_machine is done. Therefore it can
      race perfectly well with code assuming the domains are right.
      
      Cure this by building the domains from cpu_active_mask on
      CPU_DOWN_PREPARE.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6ad4c188
    • G
      cpumask: Fix generate_sched_domains() for UP · e1b8090b
      Geert Uytterhoeven 提交于
      Commit acc3f5d7 ("cpumask:
      Partition_sched_domains takes array of cpumask_var_t") changed
      the function signature of generate_sched_domains() for the
      CONFIG_SMP=y case, but forgot to update the corresponding
      function for the CONFIG_SMP=n case, causing:
      
        kernel/cpuset.c:2073: warning: passing argument 1 of 'generate_sched_domains' from incompatible pointer type
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <alpine.DEB.2.00.0912062038070.5693@ayla.of.borg>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e1b8090b
  2. 04 11月, 2009 1 次提交
    • R
      cpumask: Partition_sched_domains takes array of cpumask_var_t · acc3f5d7
      Rusty Russell 提交于
      Currently partition_sched_domains() takes a 'struct cpumask
      *doms_new' which is a kmalloc'ed array of cpumask_t.  You can't
      have such an array if 'struct cpumask' is undefined, as we plan
      for CONFIG_CPUMASK_OFFSTACK=y.
      
      So, we make this an array of cpumask_var_t instead: this is the
      same for the CONFIG_CPUMASK_OFFSTACK=n case, but requires
      multiple allocations for the CONFIG_CPUMASK_OFFSTACK=y case.
      Hence we add alloc_sched_domains() and free_sched_domains()
      functions.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <200911031453.40668.rusty@rustcorp.com.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      acc3f5d7
  3. 24 9月, 2009 1 次提交
  4. 21 9月, 2009 1 次提交
  5. 17 6月, 2009 3 次提交
    • M
      cpuset,mm: update tasks' mems_allowed in time · 58568d2a
      Miao Xie 提交于
      Fix allocating page cache/slab object on the unallowed node when memory
      spread is set by updating tasks' mems_allowed after its cpuset's mems is
      changed.
      
      In order to update tasks' mems_allowed in time, we must modify the code of
      memory policy.  Because the memory policy is applied in the process's
      context originally.  After applying this patch, one task directly
      manipulates anothers mems_allowed, and we use alloc_lock in the
      task_struct to protect mems_allowed and memory policy of the task.
      
      But in the fast path, we didn't use lock to protect them, because adding a
      lock may lead to performance regression.  But if we don't add a lock,the
      task might see no nodes when changing cpuset's mems_allowed to some
      non-overlapping set.  In order to avoid it, we set all new allowed nodes,
      then clear newly disallowed ones.
      
      [lee.schermerhorn@hp.com:
        The rework of mpol_new() to extract the adjusting of the node mask to
        apply cpuset and mpol flags "context" breaks set_mempolicy() and mbind()
        with MPOL_PREFERRED and a NULL nodemask--i.e., explicit local
        allocation.  Fix this by adding the check for MPOL_PREFERRED and empty
        node mask to mpol_new_mpolicy().
      
        Remove the now unneeded 'nodes = NULL' from mpol_new().
      
        Note that mpol_new_mempolicy() is always called with a non-NULL
        'nodes' parameter now that it has been removed from mpol_new().
        Therefore, we don't need to test nodes for NULL before testing it for
        'empty'.  However, just to be extra paranoid, add a VM_BUG_ON() to
        verify this assumption.]
      [lee.schermerhorn@hp.com:
      
        I don't think the function name 'mpol_new_mempolicy' is descriptive
        enough to differentiate it from mpol_new().
      
        This function applies cpuset set context, usually constraining nodes
        to those allowed by the cpuset.  However, when the 'RELATIVE_NODES flag
        is set, it also translates the nodes.  So I settled on
        'mpol_set_nodemask()', because the comment block for mpol_new() mentions
        that we need to call this function to "set nodes".
      
        Some additional minor line length, whitespace and typo cleanup.]
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Paul Menage <menage@google.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      58568d2a
    • M
      cpusets: update tasks' page/slab spread flags in time · 950592f7
      Miao Xie 提交于
      Fix the bug that the kernel didn't spread page cache/slab object evenly
      over all the allowed nodes when spread flags were set by updating tasks'
      page/slab spread flags in time.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Paul Menage <menage@google.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      950592f7
    • M
      cpusets: restructure the function cpuset_update_task_memory_state() · f3b39d47
      Miao Xie 提交于
      The kernel still allocates the page caches on old node after modifying its
      cpuset's mems when 'memory_spread_page' was set, or it didn't spread the
      page cache evenly over all the nodes that faulting task is allowed to usr
      after memory_spread_page was set.  it is caused by the old mem_allowed and
      flags of the task, the current kernel doesn't updates them unless some
      function invokes cpuset_update_task_memory_state(), it is too late
      sometimes.We must update the mem_allowed and the flags of the tasks in
      time.
      
      Slab has the same problem.
      
      The following patches fix this bug by updating tasks' mem_allowed and
      spread flag after its cpuset's mems or spread flag is changed.
      
      This patch:
      
      Extract a function from cpuset_update_task_memory_state().  It will be
      used later for update tasks' page/slab spread flags after its cpuset's
      flag is set
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Paul Menage <menage@google.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f3b39d47
  6. 12 6月, 2009 1 次提交
  7. 03 4月, 2009 8 次提交
  8. 19 1月, 2009 1 次提交
    • M
      cpuset: fix possible deadlock in async_rebuild_sched_domains · f90d4118
      Miao Xie 提交于
      Lockdep reported some possible circular locking info when we tested cpuset on
      NUMA/fake NUMA box.
      
      =======================================================
      [ INFO: possible circular locking dependency detected ]
      2.6.29-rc1-00224-ga6525042 #111
      -------------------------------------------------------
      bash/2968 is trying to acquire lock:
       (events){--..}, at: [<ffffffff8024c8cd>] flush_work+0x24/0xd8
      
      but task is already holding lock:
       (cgroup_mutex){--..}, at: [<ffffffff8026ad1e>] cgroup_lock_live_group+0x12/0x29
      
      which lock already depends on the new lock.
      ......
      -------------------------------------------------------
      
      Steps to reproduce:
      # mkdir /dev/cpuset
      # mount -t cpuset xxx /dev/cpuset
      # mkdir /dev/cpuset/0
      # echo 0 > /dev/cpuset/0/cpus
      # echo 0 > /dev/cpuset/0/mems
      # echo 1 > /dev/cpuset/0/memory_migrate
      # cat /dev/zero > /dev/null &
      # echo $! > /dev/cpuset/0/tasks
      
      This is because async_rebuild_sched_domains has the following lock sequence:
      run_workqueue(async_rebuild_sched_domains)
      	-> do_rebuild_sched_domains -> cgroup_lock
      
      But, attaching tasks when memory_migrate is set has following:
      cgroup_lock_live_group(cgroup_tasks_write)
      	-> do_migrate_pages -> flush_work
      
      This patch fixes it by using a separate workqueue thread.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f90d4118
  9. 16 1月, 2009 1 次提交
  10. 09 1月, 2009 8 次提交
  11. 07 1月, 2009 1 次提交
  12. 13 12月, 2008 1 次提交
    • R
      cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and... · 29c0177e
      Rusty Russell 提交于
      cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and cpulist_scnprintf to take pointers.
      
      Impact: change calling convention of existing cpumask APIs
      
      Most cpumask functions started with cpus_: these have been replaced by
      cpumask_ ones which take struct cpumask pointers as expected.
      
      These four functions don't have good replacement names; fortunately
      they're rarely used, so we just change them over.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NMike Travis <travis@sgi.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: paulus@samba.org
      Cc: mingo@redhat.com
      Cc: tony.luck@intel.com
      Cc: ralf@linux-mips.org
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: cl@linux-foundation.org
      Cc: srostedt@redhat.com
      29c0177e
  13. 30 11月, 2008 1 次提交
    • I
      sched, cpusets: fix warning in kernel/cpuset.c · 1583715d
      Ingo Molnar 提交于
      this warning:
      
        kernel/cpuset.c: In function ‘generate_sched_domains’:
        kernel/cpuset.c:588: warning: ‘ndoms’ may be used uninitialized in this function
      
      triggers because GCC does not recognize that ndoms stays uninitialized
      only if doms is NULL - but that flow is covered at the end of
      generate_sched_domains().
      
      Help out GCC by initializing this variable to 0. (that's prudent anyway)
      
      Also, this function needs a splitup and code flow simplification:
      with 160 lines length it's clearly too long.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1583715d
  14. 20 11月, 2008 1 次提交
  15. 18 11月, 2008 1 次提交
    • L
      cpuset: fix regression when failed to generate sched domains · 700018e0
      Li Zefan 提交于
      Impact: properly rebuild sched-domains on kmalloc() failure
      
      When cpuset failed to generate sched domains due to kmalloc()
      failure, the scheduler should fallback to the single partition
      'fallback_doms' and rebuild sched domains, but now it only
      destroys but not rebuilds sched domains.
      
      The regression was introduced by:
      
      | commit dfb512ec
      | Author: Max Krasnyansky <maxk@qualcomm.com>
      | Date:   Fri Aug 29 13:11:41 2008 -0700
      |
      |    sched: arch_reinit_sched_domains() must destroy domains to force rebuild
      
      After the above commit, partition_sched_domains(0, NULL, NULL) will
      only destroy sched domains and partition_sched_domains(1, NULL, NULL)
      will create the default sched domain.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      700018e0
  16. 20 10月, 2008 2 次提交
  17. 03 10月, 2008 1 次提交
  18. 14 9月, 2008 1 次提交
  19. 14 8月, 2008 1 次提交
    • M
      sched, cpuset: rework sched domains and CPU hotplug handling (v4) · cf417141
      Max Krasnyansky 提交于
      This is an updated version of my previous cpuset patch on top of
      the latest mainline git.
      The patch fixes CPU hotplug handling issues in the current cpusets code.
      Namely circular locking in rebuild_sched_domains() and unsafe access to
      the cpu_online_map in the cpuset cpu hotplug handler.
      
      This version includes changes suggested by Paul Jackson (naming, comments,
      style, etc). I also got rid of the separate workqueue thread because it is
      now safe to call get_online_cpus() from workqueue callbacks.
      
      Here are some more details:
      
      rebuild_sched_domains() is the only way to rebuild sched domains
      correctly based on the current cpuset settings. What this means
      is that we need to be able to call it from different contexts,
      like cpu hotplug for example.
      Also latest scheduler code in -tip now calls rebuild_sched_domains()
      directly from functions like arch_reinit_sched_domains().
      
      In order to support that properly we need to rework cpuset locking
      rules to avoid circular dependencies, which is what this patch does.
      New lock nesting rules are explained in the comments.
      We can now safely call rebuild_sched_domains() from virtually any
      context. The only requirement is that it needs to be called under
      get_online_cpus(). This allows cpu hotplug handlers and the scheduler
      to call rebuild_sched_domains() directly.
      The rest of the cpuset code now offloads sched domains rebuilds to
      a workqueue (async_rebuild_sched_domains()).
      
      This version of the patch addresses comments from the previous review.
      I fixed all miss-formated comments and trailing spaces.
      
      I also factored out the code that builds domain masks and split up CPU and
      memory hotplug handling. This was needed to simplify locking, to avoid unsafe
      access to the cpu_online_map from mem hotplug handler, and in general to make
      things cleaner.
      
      The patch passes moderate testing (building kernel with -j 16, creating &
      removing domains and bringing cpus off/online at the same time) on the
      quad-core2 based machine.
      
      It passes lockdep checks, even with preemptable RCU enabled.
      This time I also tested in with suspend/resume path and everything is working
      as expected.
      Signed-off-by: NMax Krasnyansky <maxk@qualcomm.com>
      Acked-by: NPaul Jackson <pj@sgi.com>
      Cc: menage@google.com
      Cc: a.p.zijlstra@chello.nl
      Cc: vegard.nossum@gmail.com
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cf417141
  20. 31 7月, 2008 3 次提交