1. 24 9月, 2009 1 次提交
  2. 17 6月, 2009 3 次提交
    • M
      cpuset,mm: update tasks' mems_allowed in time · 58568d2a
      Miao Xie 提交于
      Fix allocating page cache/slab object on the unallowed node when memory
      spread is set by updating tasks' mems_allowed after its cpuset's mems is
      changed.
      
      In order to update tasks' mems_allowed in time, we must modify the code of
      memory policy.  Because the memory policy is applied in the process's
      context originally.  After applying this patch, one task directly
      manipulates anothers mems_allowed, and we use alloc_lock in the
      task_struct to protect mems_allowed and memory policy of the task.
      
      But in the fast path, we didn't use lock to protect them, because adding a
      lock may lead to performance regression.  But if we don't add a lock,the
      task might see no nodes when changing cpuset's mems_allowed to some
      non-overlapping set.  In order to avoid it, we set all new allowed nodes,
      then clear newly disallowed ones.
      
      [lee.schermerhorn@hp.com:
        The rework of mpol_new() to extract the adjusting of the node mask to
        apply cpuset and mpol flags "context" breaks set_mempolicy() and mbind()
        with MPOL_PREFERRED and a NULL nodemask--i.e., explicit local
        allocation.  Fix this by adding the check for MPOL_PREFERRED and empty
        node mask to mpol_new_mpolicy().
      
        Remove the now unneeded 'nodes = NULL' from mpol_new().
      
        Note that mpol_new_mempolicy() is always called with a non-NULL
        'nodes' parameter now that it has been removed from mpol_new().
        Therefore, we don't need to test nodes for NULL before testing it for
        'empty'.  However, just to be extra paranoid, add a VM_BUG_ON() to
        verify this assumption.]
      [lee.schermerhorn@hp.com:
      
        I don't think the function name 'mpol_new_mempolicy' is descriptive
        enough to differentiate it from mpol_new().
      
        This function applies cpuset set context, usually constraining nodes
        to those allowed by the cpuset.  However, when the 'RELATIVE_NODES flag
        is set, it also translates the nodes.  So I settled on
        'mpol_set_nodemask()', because the comment block for mpol_new() mentions
        that we need to call this function to "set nodes".
      
        Some additional minor line length, whitespace and typo cleanup.]
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Paul Menage <menage@google.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      58568d2a
    • M
      cpusets: update tasks' page/slab spread flags in time · 950592f7
      Miao Xie 提交于
      Fix the bug that the kernel didn't spread page cache/slab object evenly
      over all the allowed nodes when spread flags were set by updating tasks'
      page/slab spread flags in time.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Paul Menage <menage@google.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      950592f7
    • M
      cpusets: restructure the function cpuset_update_task_memory_state() · f3b39d47
      Miao Xie 提交于
      The kernel still allocates the page caches on old node after modifying its
      cpuset's mems when 'memory_spread_page' was set, or it didn't spread the
      page cache evenly over all the nodes that faulting task is allowed to usr
      after memory_spread_page was set.  it is caused by the old mem_allowed and
      flags of the task, the current kernel doesn't updates them unless some
      function invokes cpuset_update_task_memory_state(), it is too late
      sometimes.We must update the mem_allowed and the flags of the tasks in
      time.
      
      Slab has the same problem.
      
      The following patches fix this bug by updating tasks' mem_allowed and
      spread flag after its cpuset's mems or spread flag is changed.
      
      This patch:
      
      Extract a function from cpuset_update_task_memory_state().  It will be
      used later for update tasks' page/slab spread flags after its cpuset's
      flag is set
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Paul Menage <menage@google.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f3b39d47
  3. 12 6月, 2009 1 次提交
  4. 03 4月, 2009 8 次提交
  5. 19 1月, 2009 1 次提交
    • M
      cpuset: fix possible deadlock in async_rebuild_sched_domains · f90d4118
      Miao Xie 提交于
      Lockdep reported some possible circular locking info when we tested cpuset on
      NUMA/fake NUMA box.
      
      =======================================================
      [ INFO: possible circular locking dependency detected ]
      2.6.29-rc1-00224-ga6525042 #111
      -------------------------------------------------------
      bash/2968 is trying to acquire lock:
       (events){--..}, at: [<ffffffff8024c8cd>] flush_work+0x24/0xd8
      
      but task is already holding lock:
       (cgroup_mutex){--..}, at: [<ffffffff8026ad1e>] cgroup_lock_live_group+0x12/0x29
      
      which lock already depends on the new lock.
      ......
      -------------------------------------------------------
      
      Steps to reproduce:
      # mkdir /dev/cpuset
      # mount -t cpuset xxx /dev/cpuset
      # mkdir /dev/cpuset/0
      # echo 0 > /dev/cpuset/0/cpus
      # echo 0 > /dev/cpuset/0/mems
      # echo 1 > /dev/cpuset/0/memory_migrate
      # cat /dev/zero > /dev/null &
      # echo $! > /dev/cpuset/0/tasks
      
      This is because async_rebuild_sched_domains has the following lock sequence:
      run_workqueue(async_rebuild_sched_domains)
      	-> do_rebuild_sched_domains -> cgroup_lock
      
      But, attaching tasks when memory_migrate is set has following:
      cgroup_lock_live_group(cgroup_tasks_write)
      	-> do_migrate_pages -> flush_work
      
      This patch fixes it by using a separate workqueue thread.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f90d4118
  6. 16 1月, 2009 1 次提交
  7. 09 1月, 2009 8 次提交
  8. 07 1月, 2009 1 次提交
  9. 13 12月, 2008 1 次提交
    • R
      cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and... · 29c0177e
      Rusty Russell 提交于
      cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and cpulist_scnprintf to take pointers.
      
      Impact: change calling convention of existing cpumask APIs
      
      Most cpumask functions started with cpus_: these have been replaced by
      cpumask_ ones which take struct cpumask pointers as expected.
      
      These four functions don't have good replacement names; fortunately
      they're rarely used, so we just change them over.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NMike Travis <travis@sgi.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: paulus@samba.org
      Cc: mingo@redhat.com
      Cc: tony.luck@intel.com
      Cc: ralf@linux-mips.org
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: cl@linux-foundation.org
      Cc: srostedt@redhat.com
      29c0177e
  10. 30 11月, 2008 1 次提交
    • I
      sched, cpusets: fix warning in kernel/cpuset.c · 1583715d
      Ingo Molnar 提交于
      this warning:
      
        kernel/cpuset.c: In function ‘generate_sched_domains’:
        kernel/cpuset.c:588: warning: ‘ndoms’ may be used uninitialized in this function
      
      triggers because GCC does not recognize that ndoms stays uninitialized
      only if doms is NULL - but that flow is covered at the end of
      generate_sched_domains().
      
      Help out GCC by initializing this variable to 0. (that's prudent anyway)
      
      Also, this function needs a splitup and code flow simplification:
      with 160 lines length it's clearly too long.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1583715d
  11. 20 11月, 2008 1 次提交
  12. 18 11月, 2008 1 次提交
    • L
      cpuset: fix regression when failed to generate sched domains · 700018e0
      Li Zefan 提交于
      Impact: properly rebuild sched-domains on kmalloc() failure
      
      When cpuset failed to generate sched domains due to kmalloc()
      failure, the scheduler should fallback to the single partition
      'fallback_doms' and rebuild sched domains, but now it only
      destroys but not rebuilds sched domains.
      
      The regression was introduced by:
      
      | commit dfb512ec
      | Author: Max Krasnyansky <maxk@qualcomm.com>
      | Date:   Fri Aug 29 13:11:41 2008 -0700
      |
      |    sched: arch_reinit_sched_domains() must destroy domains to force rebuild
      
      After the above commit, partition_sched_domains(0, NULL, NULL) will
      only destroy sched domains and partition_sched_domains(1, NULL, NULL)
      will create the default sched domain.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      700018e0
  13. 20 10月, 2008 2 次提交
  14. 03 10月, 2008 1 次提交
  15. 14 9月, 2008 1 次提交
  16. 14 8月, 2008 1 次提交
    • M
      sched, cpuset: rework sched domains and CPU hotplug handling (v4) · cf417141
      Max Krasnyansky 提交于
      This is an updated version of my previous cpuset patch on top of
      the latest mainline git.
      The patch fixes CPU hotplug handling issues in the current cpusets code.
      Namely circular locking in rebuild_sched_domains() and unsafe access to
      the cpu_online_map in the cpuset cpu hotplug handler.
      
      This version includes changes suggested by Paul Jackson (naming, comments,
      style, etc). I also got rid of the separate workqueue thread because it is
      now safe to call get_online_cpus() from workqueue callbacks.
      
      Here are some more details:
      
      rebuild_sched_domains() is the only way to rebuild sched domains
      correctly based on the current cpuset settings. What this means
      is that we need to be able to call it from different contexts,
      like cpu hotplug for example.
      Also latest scheduler code in -tip now calls rebuild_sched_domains()
      directly from functions like arch_reinit_sched_domains().
      
      In order to support that properly we need to rework cpuset locking
      rules to avoid circular dependencies, which is what this patch does.
      New lock nesting rules are explained in the comments.
      We can now safely call rebuild_sched_domains() from virtually any
      context. The only requirement is that it needs to be called under
      get_online_cpus(). This allows cpu hotplug handlers and the scheduler
      to call rebuild_sched_domains() directly.
      The rest of the cpuset code now offloads sched domains rebuilds to
      a workqueue (async_rebuild_sched_domains()).
      
      This version of the patch addresses comments from the previous review.
      I fixed all miss-formated comments and trailing spaces.
      
      I also factored out the code that builds domain masks and split up CPU and
      memory hotplug handling. This was needed to simplify locking, to avoid unsafe
      access to the cpu_online_map from mem hotplug handler, and in general to make
      things cleaner.
      
      The patch passes moderate testing (building kernel with -j 16, creating &
      removing domains and bringing cpus off/online at the same time) on the
      quad-core2 based machine.
      
      It passes lockdep checks, even with preemptable RCU enabled.
      This time I also tested in with suspend/resume path and everything is working
      as expected.
      Signed-off-by: NMax Krasnyansky <maxk@qualcomm.com>
      Acked-by: NPaul Jackson <pj@sgi.com>
      Cc: menage@google.com
      Cc: a.p.zijlstra@chello.nl
      Cc: vegard.nossum@gmail.com
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cf417141
  17. 31 7月, 2008 4 次提交
  18. 26 7月, 2008 3 次提交