1. 07 1月, 2006 2 次提交
  2. 31 10月, 2005 1 次提交
    • P
      [PATCH] cpusets: automatic numa mempolicy rebinding · 68860ec1
      Paul Jackson 提交于
      This patch automatically updates a tasks NUMA mempolicy when its cpuset
      memory placement changes.  It does so within the context of the task,
      without any need to support low level external mempolicy manipulation.
      
      If a system is not using cpusets, or if running on a system with just the
      root (all-encompassing) cpuset, then this remap is a no-op.  Only when a
      task is moved between cpusets, or a cpusets memory placement is changed
      does the following apply.  Otherwise, the main routine below,
      rebind_policy() is not even called.
      
      When mixing cpusets, scheduler affinity, and NUMA mempolicies, the
      essential role of cpusets is to place jobs (several related tasks) on a set
      of CPUs and Memory Nodes, the essential role of sched_setaffinity is to
      manage a jobs processor placement within its allowed cpuset, and the
      essential role of NUMA mempolicy (mbind, set_mempolicy) is to manage a jobs
      memory placement within its allowed cpuset.
      
      However, CPU affinity and NUMA memory placement are managed within the
      kernel using absolute system wide numbering, not cpuset relative numbering.
      
      This is ok until a job is migrated to a different cpuset, or what's the
      same, a jobs cpuset is moved to different CPUs and Memory Nodes.
      
      Then the CPU affinity and NUMA memory placement of the tasks in the job
      need to be updated, to preserve their cpuset-relative position.  This can
      be done for CPU affinity using sched_setaffinity() from user code, as one
      task can modify anothers CPU affinity.  This cannot be done from an
      external task for NUMA memory placement, as that can only be modified in
      the context of the task using it.
      
      However, it easy enough to remap a tasks NUMA mempolicy automatically when
      a task is migrated, using the existing cpuset mechanism to trigger a
      refresh of a tasks memory placement after its cpuset has changed.  All that
      is needed is the old and new nodemask, and notice to the task that it needs
      to rebind its mempolicy.  The tasks mems_allowed has the old mask, the
      tasks cpuset has the new mask, and the existing
      cpuset_update_current_mems_allowed() mechanism provides the notice.  The
      bitmap/cpumask/nodemask remap operators provide the cpuset relative
      calculations.
      
      This patch leaves open a couple of issues:
      
       1) Updating vma and shmfs/tmpfs/hugetlbfs memory policies:
      
          These mempolicies may reference nodes outside of those allowed to
          the current task by its cpuset.  Tasks are migrated as part of jobs,
          which reside on what might be several cpusets in a subtree.  When such
          a job is migrated, all NUMA memory policy references to nodes within
          that cpuset subtree should be translated, and references to any nodes
          outside that subtree should be left untouched.  A future patch will
          provide the cpuset mechanism needed to mark such subtrees.  With that
          patch, we will be able to correctly migrate these other memory policies
          across a job migration.
      
       2) Updating cpuset, affinity and memory policies in user space:
      
          This is harder.  Any placement state stored in user space using
          system-wide numbering will be invalidated across a migration.  More
          work will be required to provide user code with a migration-safe means
          to manage its cpuset relative placement, while preserving the current
          API's that pass system wide numbers, not cpuset relative numbers across
          the kernel-user boundary.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      68860ec1
  3. 30 10月, 2005 2 次提交
  4. 09 9月, 2005 1 次提交
  5. 05 9月, 2005 1 次提交
    • C
      [PATCH] /proc/<pid>/numa_maps to show on which nodes pages reside · 6e21c8f1
      Christoph Lameter 提交于
      This patch was recently discussed on linux-mm:
      http://marc.theaimsgroup.com/?t=112085728500002&r=1&w=2
      
      I inherited a large code base from Ray for page migration.  There was a
      small patch in there that I find to be very useful since it allows the
      display of the locality of the pages in use by a process.  I reworked that
      patch and came up with a /proc/<pid>/numa_maps that gives more information
      about the vma's of a process.  numa_maps is indexes by the start address
      found in /proc/<pid>/maps.  F.e.  with this patch you can see the page use
      of the "getty" process:
      
      margin:/proc/12008 # cat maps
      00000000-00004000 r--p 00000000 00:00 0
      2000000000000000-200000000002c000 r-xp 00000000 08:04 516                /lib/ld-2.3.3.so
      2000000000038000-2000000000040000 rw-p 00028000 08:04 516                /lib/ld-2.3.3.so
      2000000000040000-2000000000044000 rw-p 2000000000040000 00:00 0
      2000000000058000-2000000000260000 r-xp 00000000 08:04 54707842           /lib/tls/libc.so.6.1
      2000000000260000-2000000000268000 ---p 00208000 08:04 54707842           /lib/tls/libc.so.6.1
      2000000000268000-2000000000274000 rw-p 00200000 08:04 54707842           /lib/tls/libc.so.6.1
      2000000000274000-2000000000280000 rw-p 2000000000274000 00:00 0
      2000000000280000-20000000002b4000 r--p 00000000 08:04 9126923            /usr/lib/locale/en_US.utf8/LC_CTYPE
      2000000000300000-2000000000308000 r--s 00000000 08:04 60071467           /usr/lib/gconv/gconv-modules.cache
      2000000000318000-2000000000328000 rw-p 2000000000318000 00:00 0
      4000000000000000-4000000000008000 r-xp 00000000 08:04 29576399           /sbin/mingetty
      6000000000004000-6000000000008000 rw-p 00004000 08:04 29576399           /sbin/mingetty
      6000000000008000-600000000002c000 rw-p 6000000000008000 00:00 0          [heap]
      60000fff7fffc000-60000fff80000000 rw-p 60000fff7fffc000 00:00 0
      60000ffffff44000-60000ffffff98000 rw-p 60000ffffff44000 00:00 0          [stack]
      a000000000000000-a000000000020000 ---p 00000000 00:00 0                  [vdso]
      
      cat numa_maps
      2000000000000000 default MaxRef=43 Pages=11 Mapped=11 N0=4 N1=3 N2=2 N3=2
      2000000000038000 default MaxRef=1 Pages=2 Mapped=2 Anon=2 N0=2
      2000000000040000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
      2000000000058000 default MaxRef=43 Pages=61 Mapped=61 N0=14 N1=15 N2=16 N3=16
      2000000000268000 default MaxRef=1 Pages=2 Mapped=2 Anon=2 N0=2
      2000000000274000 default MaxRef=1 Pages=3 Mapped=3 Anon=3 N0=3
      2000000000280000 default MaxRef=8 Pages=3 Mapped=3 N0=3
      2000000000300000 default MaxRef=8 Pages=2 Mapped=2 N0=2
      2000000000318000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N2=1
      4000000000000000 default MaxRef=6 Pages=2 Mapped=2 N1=2
      6000000000004000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
      6000000000008000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
      60000fff7fffc000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
      60000ffffff44000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
      
      getty uses ld.so.  The first vma is the code segment which is used by 43
      other processes and the pages are evenly distributed over the 4 nodes.
      
      The second vma is the process specific data portion for ld.so.  This is
      only one page.
      
      The display format is:
      
      <startaddress>	 Links to information in /proc/<pid>/map
      <memory policy>  This can be "default" "interleave={}", "prefer=<node>" or "bind={<zones>}"
      MaxRef=		<maximum reference to a page in this vma>
      Pages=		<Nr of pages in use>
      Mapped=		<Nr of pages with mapcount >
      Anon=		<nr of anonymous pages>
      Nx=		<Nr of pages on Node x>
      
      The content of the proc-file is self-evident.  If this would be tied into
      the sparsemem system then the contents of this file would not be too
      useful.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6e21c8f1
  6. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4