1. 28 4月, 2008 1 次提交
    • M
      mm: filter based on a nodemask as well as a gfp_mask · 19770b32
      Mel Gorman 提交于
      The MPOL_BIND policy creates a zonelist that is used for allocations
      controlled by that mempolicy.  As the per-node zonelist is already being
      filtered based on a zone id, this patch adds a version of __alloc_pages() that
      takes a nodemask for further filtering.  This eliminates the need for
      MPOL_BIND to create a custom zonelist.
      
      A positive benefit of this is that allocations using MPOL_BIND now use the
      local node's distance-ordered zonelist instead of a custom node-id-ordered
      zonelist.  I.e., pages will be allocated from the closest allowed node with
      available memory.
      
      [Lee.Schermerhorn@hp.com: Mempolicy: update stale documentation and comments]
      [Lee.Schermerhorn@hp.com: Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask]
      [Lee.Schermerhorn@hp.com: Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask rework]
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Acked-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      19770b32
  2. 16 4月, 2008 1 次提交
  3. 07 3月, 2008 1 次提交
  4. 22 2月, 2008 1 次提交
  5. 08 2月, 2008 1 次提交
    • C
      SLUB: Support for performance statistics · 8ff12cfc
      Christoph Lameter 提交于
      The statistics provided here allow the monitoring of allocator behavior but
      at the cost of some (minimal) loss of performance. Counters are placed in
      SLUB's per cpu data structure. The per cpu structure may be extended by the
      statistics to grow larger than one cacheline which will increase the cache
      footprint of SLUB.
      
      There is a compile option to enable/disable the inclusion of the runtime
      statistics and its off by default.
      
      The slabinfo tool is enhanced to support these statistics via two options:
      
      -D 	Switches the line of information displayed for a slab from size
      	mode to activity mode.
      
      -A	Sorts the slabs displayed by activity. This allows the display of
      	the slabs most important to the performance of a certain load.
      
      -r	Report option will report detailed statistics on
      
      Example (tbench load):
      
      slabinfo -AD		->Shows the most active slabs
      
      Name                   Objects    Alloc     Free   %Fast
      skbuff_fclone_cache         33 111953835 111953835  99  99
      :0000192                  2666  5283688  5281047  99  99
      :0001024                   849  5247230  5246389  83  83
      vm_area_struct            1349   119642   118355  91  22
      :0004096                    15    66753    66751  98  98
      :0000064                  2067    25297    23383  98  78
      dentry                   10259    28635    18464  91  45
      :0000080                 11004    18950     8089  98  98
      :0000096                  1703    12358    10784  99  98
      :0000128                   762    10582     9875  94  18
      :0000512                   184     9807     9647  95  81
      :0002048                   479     9669     9195  83  65
      anon_vma                   777     9461     9002  99  71
      kmalloc-8                 6492     9981     5624  99  97
      :0000768                   258     7174     6931  58  15
      
      So the skbuff_fclone_cache is of highest importance for the tbench load.
      Pretty high load on the 192 sized slab. Look for the aliases
      
      slabinfo -a | grep 000192
      :0000192     <- xfs_btree_cur filp kmalloc-192 uid_cache tw_sock_TCP
      	request_sock_TCPv6 tw_sock_TCPv6 skbuff_head_cache xfs_ili
      
      Likely skbuff_head_cache.
      
      
      Looking into the statistics of the skbuff_fclone_cache is possible through
      
      slabinfo skbuff_fclone_cache	->-r option implied if cache name is mentioned
      
      
      .... Usual output ...
      
      Slab Perf Counter       Alloc     Free %Al %Fr
      --------------------------------------------------
      Fastpath             111953360 111946981  99  99
      Slowpath                 1044     7423   0   0
      Page Alloc                272      264   0   0
      Add partial                25      325   0   0
      Remove partial             86      264   0   0
      RemoteObj/SlabFrozen      350     4832   0   0
      Total                111954404 111954404
      
      Flushes       49 Refill        0
      Deactivate Full=325(92%) Empty=0(0%) ToHead=24(6%) ToTail=1(0%)
      
      Looks good because the fastpath is overwhelmingly taken.
      
      
      skbuff_head_cache:
      
      Slab Perf Counter       Alloc     Free %Al %Fr
      --------------------------------------------------
      Fastpath              5297262  5259882  99  99
      Slowpath                 4477    39586   0   0
      Page Alloc                937      824   0   0
      Add partial                 0     2515   0   0
      Remove partial           1691      824   0   0
      RemoteObj/SlabFrozen     2621     9684   0   0
      Total                 5301739  5299468
      
      Deactivate Full=2620(100%) Empty=0(0%) ToHead=0(0%) ToTail=0(0%)
      
      
      Descriptions of the output:
      
      Total:		The total number of allocation and frees that occurred for a
      		slab
      
      Fastpath:	The number of allocations/frees that used the fastpath.
      
      Slowpath:	Other allocations
      
      Page Alloc:	Number of calls to the page allocator as a result of slowpath
      		processing
      
      Add Partial:	Number of slabs added to the partial list through free or
      		alloc (occurs during cpuslab flushes)
      
      Remove Partial:	Number of slabs removed from the partial list as a result of
      		allocations retrieving a partial slab or by a free freeing
      		the last object of a slab.
      
      RemoteObj/Froz:	How many times were remotely freed object encountered when a
      		slab was about to be deactivated. Frozen: How many times was
      		free able to skip list processing because the slab was in use
      		as the cpuslab of another processor.
      
      Flushes:	Number of times the cpuslab was flushed on request
      		(kmem_cache_shrink, may result from races in __slab_alloc)
      
      Refill:		Number of times we were able to refill the cpuslab from
      		remotely freed objects for the same slab.
      
      Deactivate:	Statistics how slabs were deactivated. Shows how they were
      		put onto the partial list.
      
      In general fastpath is very good. Slowpath without partial list processing is
      also desirable. Any touching of partial list uses node specific locks which
      may potentially cause list lock contention.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      8ff12cfc
  6. 25 1月, 2008 1 次提交
  7. 18 12月, 2007 1 次提交
  8. 17 10月, 2007 3 次提交
  9. 23 8月, 2007 1 次提交
  10. 10 8月, 2007 1 次提交
  11. 18 7月, 2007 1 次提交
    • C
      SLUB: change error reporting format to follow lockdep loosely · 24922684
      Christoph Lameter 提交于
      Changes the error reporting format to loosely follow lockdep.
      
      If data corruption is detected then we generate the following lines:
      
      ============================================
      BUG <slab-cache>: <problem>
      --------------------------------------------
      
      INFO: <more information> [possibly multiple times]
      
      <object dump>
      
      FIX <slab-cache>: <remedial action>
      
      This also adds some more intelligence to the data corruption detection. Its
      now capable of figuring out the start and end.
      
      Add a comment on how to configure SLUB so that a production system may
      continue to operate even though occasional slab corruption occur through
      a misbehaving kernel component. See "Emergency operations" in
      Documentation/vm/slub.txt.
      
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      24922684
  12. 17 7月, 2007 2 次提交
  13. 31 5月, 2007 1 次提交
  14. 18 5月, 2007 1 次提交
  15. 17 5月, 2007 1 次提交
  16. 10 5月, 2007 1 次提交
  17. 08 5月, 2007 2 次提交
  18. 04 10月, 2006 1 次提交
  19. 23 6月, 2006 2 次提交
    • C
      [PATCH] page migration: sys_move_pages(): support moving of individual pages · 742755a1
      Christoph Lameter 提交于
      move_pages() is used to move individual pages of a process. The function can
      be used to determine the location of pages and to move them onto the desired
      node. move_pages() returns status information for each page.
      
      long move_pages(pid, number_of_pages_to_move,
      		addresses_of_pages[],
      		nodes[] or NULL,
      		status[],
      		flags);
      
      The addresses of pages is an array of void * pointing to the
      pages to be moved.
      
      The nodes array contains the node numbers that the pages should be moved
      to. If a NULL is passed instead of an array then no pages are moved but
      the status array is updated. The status request may be used to determine
      the page state before issuing another move_pages() to move pages.
      
      The status array will contain the state of all individual page migration
      attempts when the function terminates. The status array is only valid if
      move_pages() completed successfullly.
      
      Possible page states in status[]:
      
      0..MAX_NUMNODES	The page is now on the indicated node.
      
      -ENOENT		Page is not present
      
      -EACCES		Page is mapped by multiple processes and can only
      		be moved if MPOL_MF_MOVE_ALL is specified.
      
      -EPERM		The page has been mlocked by a process/driver and
      		cannot be moved.
      
      -EBUSY		Page is busy and cannot be moved. Try again later.
      
      -EFAULT		Invalid address (no VMA or zero page).
      
      -ENOMEM		Unable to allocate memory on target node.
      
      -EIO		Unable to write back page. The page must be written
      		back in order to move it since the page is dirty and the
      		filesystem does not provide a migration function that
      		would allow the moving of dirty pages.
      
      -EINVAL		A dirty page cannot be moved. The filesystem does not provide
      		a migration function and has no ability to write back pages.
      
      The flags parameter indicates what types of pages to move:
      
      MPOL_MF_MOVE	Move pages that are only mapped by the process.
      
      MPOL_MF_MOVE_ALL Also move pages that are mapped by multiple processes.
      		Requires sufficient capabilities.
      
      Possible return codes from move_pages()
      
      -ENOENT		No pages found that would require moving. All pages
      		are either already on the target node, not present, had an
      		invalid address or could not be moved because they were
      		mapped by multiple processes.
      
      -EINVAL		Flags other than MPOL_MF_MOVE(_ALL) specified or an attempt
      		to migrate pages in a kernel thread.
      
      -EPERM		MPOL_MF_MOVE_ALL specified without sufficient priviledges.
      		or an attempt to move a process belonging to another user.
      
      -EACCES		One of the target nodes is not allowed by the current cpuset.
      
      -ENODEV		One of the target nodes is not online.
      
      -ESRCH		Process does not exist.
      
      -E2BIG		Too many pages to move.
      
      -ENOMEM		Not enough memory to allocate control array.
      
      -EFAULT		Parameters could not be accessed.
      
      A test program for move_pages() may be found with the patches
      on ftp.kernel.org:/pub/linux/kernel/people/christoph/pmig/patches-2.6.17-rc4-mm3
      
      From: Christoph Lameter <clameter@sgi.com>
      
        Detailed results for sys_move_pages()
      
        Pass a pointer to an integer to get_new_page() that may be used to
        indicate where the completion status of a migration operation should be
        placed.  This allows sys_move_pags() to report back exactly what happened to
        each page.
      
        Wish there would be a better way to do this. Looks a bit hacky.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Jes Sorensen <jes@trained-monkey.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      742755a1
    • C
      8d3c138b
  20. 20 4月, 2006 1 次提交
  21. 11 4月, 2006 1 次提交
  22. 15 3月, 2006 1 次提交
  23. 02 2月, 2006 1 次提交
  24. 07 11月, 2005 1 次提交
  25. 05 9月, 2005 1 次提交
    • H
      [PATCH] swap: swap_lock replace list+device · 5d337b91
      Hugh Dickins 提交于
      The idea of a swap_device_lock per device, and a swap_list_lock over them all,
      is appealing; but in practice almost every holder of swap_device_lock must
      already hold swap_list_lock, which defeats the purpose of the split.
      
      The only exceptions have been swap_duplicate, valid_swaphandles and an
      untrodden path in try_to_unuse (plus a few places added in this series).
      valid_swaphandles doesn't show up high in profiles, but swap_duplicate does
      demand attention.  However, with the hold time in get_swap_pages so much
      reduced, I've not yet found a load and set of swap device priorities to show
      even swap_duplicate benefitting from the split.  Certainly the split is mere
      overhead in the common case of a single swap device.
      
      So, replace swap_list_lock and swap_device_lock by spinlock_t swap_lock
      (generally we seem to prefer an _ in the name, and not hide in a macro).
      
      If someone can show a regression in swap_duplicate, then probably we should
      add a hashlock for the swap_map entries alone (shorts being anatomic), so as
      to help the case of the single swap device too.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5d337b91
  26. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4