1. 14 4月, 2009 2 次提交
  2. 09 4月, 2009 1 次提交
    • L
      tracing: consolidate documents · 66bb7488
      Li Zefan 提交于
      Move kmemtrace.txt, tracepoints.txt, ftrace.txt and mmiotrace.txt to
      the new trace/ directory.
      
      I didnt find any references to those documents in both source
      files and documents, so no extra work needs to be done.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NPekka Paalanen <pq@iki.fi>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      LKML-Reference: <49DD6E2B.6090200@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      66bb7488
  3. 30 3月, 2009 1 次提交
  4. 07 1月, 2009 1 次提交
    • H
      mm: remove try_to_munlock from vmscan · 63d6c5ad
      Hugh Dickins 提交于
      An unfortunate feature of the Unevictable LRU work was that reclaiming an
      anonymous page involved an extra scan through the anon_vma: to check that
      the page is evictable before allocating swap, because the swap could not
      be freed reliably soon afterwards.
      
      Now try_to_free_swap() has replaced remove_exclusive_swap_page(), that's
      not an issue any more: remove try_to_munlock() call from
      shrink_page_list(), leaving it to try_to_munmap() to discover if the page
      is one to be culled to the unevictable list - in which case then
      try_to_free_swap().
      
      Update unevictable-lru.txt to remove comments on the try_to_munlock() in
      shrink_page_list(), and shorten some lines over 80 columns.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Robin Holt <holt@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      63d6c5ad
  5. 29 12月, 2008 3 次提交
  6. 31 10月, 2008 1 次提交
  7. 20 10月, 2008 1 次提交
  8. 15 8月, 2008 1 次提交
  9. 13 8月, 2008 1 次提交
    • R
      docsrc: build Documentation/ sources · 3794f3e8
      Randy Dunlap 提交于
      Currently source files in the Documentation/ sub-dir can easily bit-rot
      since they are not generally buildable, either because they are hidden in
      text files or because there are no Makefile rules for them.  This needs to
      be fixed so that the source files remain usable and good examples of code
      instead of bad examples.
      
      Add the ability to build source files that are in the Documentation/ dir.
      Add to Kconfig as "BUILD_DOCSRC" config symbol.
      
      Use "CONFIG_BUILD_DOCSRC=1 make ..." to build objects from the
      Documentation/ sources.  Or enable BUILD_DOCSRC in the *config system.
      However, this symbol depends on HEADERS_CHECK since the header files need
      to be installed (for userspace builds).
      
      Built (using cross-tools) for x86-64, i386, alpha, ia64, sparc32,
      sparc64, powerpc, sh, m68k, & mips.
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Reviewed-by: NSam Ravnborg <sam@ravnborg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3794f3e8
  10. 27 7月, 2008 1 次提交
  11. 25 7月, 2008 1 次提交
    • N
      hugetlb: new sysfs interface · a3437870
      Nishanth Aravamudan 提交于
      Provide new hugepages user APIs that are more suited to multiple hstates
      in sysfs.  There is a new directory, /sys/kernel/hugepages.  Underneath
      that directory there will be a directory per-supported hugepage size,
      e.g.:
      
      /sys/kernel/hugepages/hugepages-64kB
      /sys/kernel/hugepages/hugepages-16384kB
      /sys/kernel/hugepages/hugepages-16777216kB
      
      corresponding to 64k, 16m and 16g respectively.  Within each
      hugepages-size directory there are a number of files, corresponding to the
      tracked counters in the hstate, e.g.:
      
      /sys/kernel/hugepages/hugepages-64/nr_hugepages
      /sys/kernel/hugepages/hugepages-64/nr_overcommit_hugepages
      /sys/kernel/hugepages/hugepages-64/free_hugepages
      /sys/kernel/hugepages/hugepages-64/resv_hugepages
      /sys/kernel/hugepages/hugepages-64/surplus_hugepages
      
      Of these files, the first two are read-write and the latter three are
      read-only.  The size of the hugepage being manipulated is trivially
      deducible from the enclosing directory and is always expressed in kB (to
      match meminfo).
      
      [dave@linux.vnet.ibm.com: fix build]
      [nacc@us.ibm.com: hugetlb: hang off of /sys/kernel/mm rather than /sys/kernel]
      [nacc@us.ibm.com: hugetlb: remove CONFIG_SYSFS dependency]
      Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
      Signed-off-by: NNishanth Aravamudan <nacc@us.ibm.com>
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Signed-off-by: NNishanth Aravamudan <nacc@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a3437870
  12. 05 7月, 2008 1 次提交
  13. 07 6月, 2008 1 次提交
  14. 02 5月, 2008 1 次提交
  15. 28 4月, 2008 7 次提交
    • L
      mempolicy: use MPOL_F_LOCAL to Indicate Preferred Local Policy · fc36b8d3
      Lee Schermerhorn 提交于
      Now that we're using "preferred local" policy for system default, we need to
      make this as fast as possible.  Because of the variable size of the mempolicy
      structure [based on size of nodemasks], the preferred_node may be in a
      different cacheline from the mode.  This can result in accessing an extra
      cacheline in the normal case of system default policy.  Suspect this is the
      cause of an observed 2-3% slowdown in page fault testing relative to kernel
      without this patch series.
      
      To alleviate this, use an internal mode flag, MPOL_F_LOCAL in the mempolicy
      flags member which is guaranteed [?] to be in the same cacheline as the mode
      itself.
      
      Verified that reworked mempolicy now performs slightly better on 25-rc8-mm1
      for both anon and shmem segments with system default and vma [preferred local]
      policy.
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fc36b8d3
    • L
      mempolicy: use MPOL_PREFERRED for system-wide default policy · bea904d5
      Lee Schermerhorn 提交于
      Currently, when one specifies MPOL_DEFAULT via a NUMA memory policy API
      [set_mempolicy(), mbind() and internal versions], the kernel simply installs a
      NULL struct mempolicy pointer in the appropriate context: task policy, vma
      policy, or shared policy.  This causes any use of that policy to "fall back"
      to the next most specific policy scope.
      
      The only use of MPOL_DEFAULT to mean "local allocation" is in the system
      default policy.  This requires extra checks/cases for MPOL_DEFAULT in many
      mempolicy.c functions.
      
      There is another, "preferred" way to specify local allocation via the APIs.
      That is using the MPOL_PREFERRED policy mode with an empty nodemask.
      Internally, the empty nodemask gets converted to a preferred_node id of '-1'.
      All internal usage of MPOL_PREFERRED will convert the '-1' to the id of the
      node local to the cpu where the allocation occurs.
      
      System default policy, except during boot, is hard-coded to "local
      allocation".  By using the MPOL_PREFERRED mode with a negative value of
      preferred node for system default policy, MPOL_DEFAULT will never occur in the
      'policy' member of a struct mempolicy.  Thus, we can remove all checks for
      MPOL_DEFAULT when converting policy to a node id/zonelist in the allocation
      paths.
      
      In slab_node() return local node id when policy pointer is NULL.  No need to
      set a pol value to take the switch default.  Replace switch default with
      BUG()--i.e., shouldn't happen.
      
      With this patch MPOL_DEFAULT is only used in the APIs, including internal
      calls to do_set_mempolicy() and in the display of policy in
      /proc/<pid>/numa_maps.  It always means "fall back" to the the next most
      specific policy scope.  This simplifies the description of memory policies
      quite a bit, with no visible change in behavior.
      
      get_mempolicy() continues to return MPOL_DEFAULT and an empty nodemask when
      the requested policy [task or vma/shared] is NULL.  These are the values one
      would supply via set_mempolicy() or mbind() to achieve that condition--default
      behavior.
      
      This patch updates Documentation to reflect this change.
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bea904d5
    • L
      mempolicy: rework mempolicy Reference Counting [yet again] · 52cd3b07
      Lee Schermerhorn 提交于
      After further discussion with Christoph Lameter, it has become clear that my
      earlier attempts to clean up the mempolicy reference counting were a bit of
      overkill in some areas, resulting in superflous ref/unref in what are usually
      fast paths.  In other areas, further inspection reveals that I botched the
      unref for interleave policies.
      
      A separate patch, suitable for upstream/stable trees, fixes up the known
      errors in the previous attempt to fix reference counting.
      
      This patch reworks the memory policy referencing counting and, one hopes,
      simplifies the code.  Maybe I'll get it right this time.
      
      See the update to the numa_memory_policy.txt document for a discussion of
      memory policy reference counting that motivates this patch.
      
      Summary:
      
      Lookup of mempolicy, based on (vma, address) need only add a reference for
      shared policy, and we need only unref the policy when finished for shared
      policies.  So, this patch backs out all of the unneeded extra reference
      counting added by my previous attempt.  It then unrefs only shared policies
      when we're finished with them, using the mpol_cond_put() [conditional put]
      helper function introduced by this patch.
      
      Note that shmem_swapin() calls read_swap_cache_async() with a dummy vma
      containing just the policy.  read_swap_cache_async() can call alloc_page_vma()
      multiple times, so we can't let alloc_page_vma() unref the shared policy in
      this case.  To avoid this, we make a copy of any non-null shared policy and
      remove the MPOL_F_SHARED flag from the copy.  This copy occurs before reading
      a page [or multiple pages] from swap, so the overhead should not be an issue
      here.
      
      I introduced a new static inline function "mpol_cond_copy()" to copy the
      shared policy to an on-stack policy and remove the flags that would require a
      conditional free.  The current implementation of mpol_cond_copy() assumes that
      the struct mempolicy contains no pointers to dynamically allocated structures
      that must be duplicated or reference counted during copy.
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      52cd3b07
    • L
      mempolicy: rename struct mempolicy 'policy' member to 'mode' · 45c4745a
      Lee Schermerhorn 提交于
      The terms 'policy' and 'mode' are both used in various places to describe the
      semantics of the value stored in the 'policy' member of struct mempolicy.
      Furthermore, the term 'policy' is used to refer to that member, to the entire
      struct mempolicy and to the more abstract concept of the tuple consisting of a
      "mode" and an optional node or set of nodes.  Recently, we have added "mode
      flags" that are passed in the upper bits of the 'mode' [or sometimes,
      'policy'] member of the numa APIs.
      
      I'd like to resolve this confusion, which perhaps only exists in my mind, by
      renaming the 'policy' member to 'mode' throughout, and fixing up the
      Documentation.  Man pages will be updated separately.
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      45c4745a
    • D
      mempolicy: disallow static or relative flags for local preferred mode · 3e1f0645
      David Rientjes 提交于
      MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES don't mean anything for
      MPOL_PREFERRED policies that were created with an empty nodemask (for purely
      local allocations).  They'll never be invalidated because the allowed mems of
      a task changes or need to be rebound relative to a cpuset's placement.
      
      Also fixes a bug identified by Lee Schermerhorn that disallowed empty
      nodemasks to be passed to MPOL_PREFERRED to specify local allocations.  [A
      different, somewhat incomplete, patch already existed in 25-rc5-mm1.]
      
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3e1f0645
    • D
      mempolicy: update NUMA memory policy documentation · 65d66fc0
      David Rientjes 提交于
      Updates Documentation/vm/numa_memory_policy.txt and
      Documentation/filesystems/tmpfs.txt to describe optional mempolicy mode flags.
      
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      65d66fc0
    • M
      mm: filter based on a nodemask as well as a gfp_mask · 19770b32
      Mel Gorman 提交于
      The MPOL_BIND policy creates a zonelist that is used for allocations
      controlled by that mempolicy.  As the per-node zonelist is already being
      filtered based on a zone id, this patch adds a version of __alloc_pages() that
      takes a nodemask for further filtering.  This eliminates the need for
      MPOL_BIND to create a custom zonelist.
      
      A positive benefit of this is that allocations using MPOL_BIND now use the
      local node's distance-ordered zonelist instead of a custom node-id-ordered
      zonelist.  I.e., pages will be allocated from the closest allowed node with
      available memory.
      
      [Lee.Schermerhorn@hp.com: Mempolicy: update stale documentation and comments]
      [Lee.Schermerhorn@hp.com: Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask]
      [Lee.Schermerhorn@hp.com: Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask rework]
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Acked-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      19770b32
  16. 27 4月, 2008 1 次提交
  17. 16 4月, 2008 1 次提交
  18. 07 3月, 2008 1 次提交
  19. 22 2月, 2008 1 次提交
  20. 08 2月, 2008 1 次提交
    • C
      SLUB: Support for performance statistics · 8ff12cfc
      Christoph Lameter 提交于
      The statistics provided here allow the monitoring of allocator behavior but
      at the cost of some (minimal) loss of performance. Counters are placed in
      SLUB's per cpu data structure. The per cpu structure may be extended by the
      statistics to grow larger than one cacheline which will increase the cache
      footprint of SLUB.
      
      There is a compile option to enable/disable the inclusion of the runtime
      statistics and its off by default.
      
      The slabinfo tool is enhanced to support these statistics via two options:
      
      -D 	Switches the line of information displayed for a slab from size
      	mode to activity mode.
      
      -A	Sorts the slabs displayed by activity. This allows the display of
      	the slabs most important to the performance of a certain load.
      
      -r	Report option will report detailed statistics on
      
      Example (tbench load):
      
      slabinfo -AD		->Shows the most active slabs
      
      Name                   Objects    Alloc     Free   %Fast
      skbuff_fclone_cache         33 111953835 111953835  99  99
      :0000192                  2666  5283688  5281047  99  99
      :0001024                   849  5247230  5246389  83  83
      vm_area_struct            1349   119642   118355  91  22
      :0004096                    15    66753    66751  98  98
      :0000064                  2067    25297    23383  98  78
      dentry                   10259    28635    18464  91  45
      :0000080                 11004    18950     8089  98  98
      :0000096                  1703    12358    10784  99  98
      :0000128                   762    10582     9875  94  18
      :0000512                   184     9807     9647  95  81
      :0002048                   479     9669     9195  83  65
      anon_vma                   777     9461     9002  99  71
      kmalloc-8                 6492     9981     5624  99  97
      :0000768                   258     7174     6931  58  15
      
      So the skbuff_fclone_cache is of highest importance for the tbench load.
      Pretty high load on the 192 sized slab. Look for the aliases
      
      slabinfo -a | grep 000192
      :0000192     <- xfs_btree_cur filp kmalloc-192 uid_cache tw_sock_TCP
      	request_sock_TCPv6 tw_sock_TCPv6 skbuff_head_cache xfs_ili
      
      Likely skbuff_head_cache.
      
      
      Looking into the statistics of the skbuff_fclone_cache is possible through
      
      slabinfo skbuff_fclone_cache	->-r option implied if cache name is mentioned
      
      
      .... Usual output ...
      
      Slab Perf Counter       Alloc     Free %Al %Fr
      --------------------------------------------------
      Fastpath             111953360 111946981  99  99
      Slowpath                 1044     7423   0   0
      Page Alloc                272      264   0   0
      Add partial                25      325   0   0
      Remove partial             86      264   0   0
      RemoteObj/SlabFrozen      350     4832   0   0
      Total                111954404 111954404
      
      Flushes       49 Refill        0
      Deactivate Full=325(92%) Empty=0(0%) ToHead=24(6%) ToTail=1(0%)
      
      Looks good because the fastpath is overwhelmingly taken.
      
      
      skbuff_head_cache:
      
      Slab Perf Counter       Alloc     Free %Al %Fr
      --------------------------------------------------
      Fastpath              5297262  5259882  99  99
      Slowpath                 4477    39586   0   0
      Page Alloc                937      824   0   0
      Add partial                 0     2515   0   0
      Remove partial           1691      824   0   0
      RemoteObj/SlabFrozen     2621     9684   0   0
      Total                 5301739  5299468
      
      Deactivate Full=2620(100%) Empty=0(0%) ToHead=0(0%) ToTail=0(0%)
      
      
      Descriptions of the output:
      
      Total:		The total number of allocation and frees that occurred for a
      		slab
      
      Fastpath:	The number of allocations/frees that used the fastpath.
      
      Slowpath:	Other allocations
      
      Page Alloc:	Number of calls to the page allocator as a result of slowpath
      		processing
      
      Add Partial:	Number of slabs added to the partial list through free or
      		alloc (occurs during cpuslab flushes)
      
      Remove Partial:	Number of slabs removed from the partial list as a result of
      		allocations retrieving a partial slab or by a free freeing
      		the last object of a slab.
      
      RemoteObj/Froz:	How many times were remotely freed object encountered when a
      		slab was about to be deactivated. Frozen: How many times was
      		free able to skip list processing because the slab was in use
      		as the cpuslab of another processor.
      
      Flushes:	Number of times the cpuslab was flushed on request
      		(kmem_cache_shrink, may result from races in __slab_alloc)
      
      Refill:		Number of times we were able to refill the cpuslab from
      		remotely freed objects for the same slab.
      
      Deactivate:	Statistics how slabs were deactivated. Shows how they were
      		put onto the partial list.
      
      In general fastpath is very good. Slowpath without partial list processing is
      also desirable. Any touching of partial list uses node specific locks which
      may potentially cause list lock contention.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      8ff12cfc
  21. 25 1月, 2008 1 次提交
  22. 18 12月, 2007 1 次提交
  23. 17 10月, 2007 3 次提交
  24. 23 8月, 2007 1 次提交
  25. 10 8月, 2007 1 次提交
  26. 18 7月, 2007 1 次提交
    • C
      SLUB: change error reporting format to follow lockdep loosely · 24922684
      Christoph Lameter 提交于
      Changes the error reporting format to loosely follow lockdep.
      
      If data corruption is detected then we generate the following lines:
      
      ============================================
      BUG <slab-cache>: <problem>
      --------------------------------------------
      
      INFO: <more information> [possibly multiple times]
      
      <object dump>
      
      FIX <slab-cache>: <remedial action>
      
      This also adds some more intelligence to the data corruption detection. Its
      now capable of figuring out the start and end.
      
      Add a comment on how to configure SLUB so that a production system may
      continue to operate even though occasional slab corruption occur through
      a misbehaving kernel component. See "Emergency operations" in
      Documentation/vm/slub.txt.
      
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      24922684
  27. 17 7月, 2007 2 次提交
  28. 31 5月, 2007 1 次提交