1. 09 1月, 2006 24 次提交
    • A
      [PATCH] kernel/: small cleanups · 97a41e26
      Adrian Bunk 提交于
      This patch contains the following cleanups:
      - make needlessly global functions static
      - every file should include the headers containing the prototypes for
        it's global functions
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Acked-by: N"Paul E. McKenney" <paulmck@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      97a41e26
    • P
      [PATCH] cpuset: skip rcu check if task is in root cpuset · 03a285f5
      Paul Jackson 提交于
      For systems that aren't using cpusets, but have them CONFIG_CPUSET enabled in
      their kernel (eventually this may be most distribution kernels), this patch
      removes even the minimal rcu_read_lock() from the memory page allocation path.
      
      Actually, it removes that rcu call for any task that is in the root cpuset
      (top_cpuset), which on systems not actively using cpusets, is all tasks.
      
      We don't need the rcu check for tasks in the top_cpuset, because the
      top_cpuset is statically allocated, so at no risk of being freed out from
      underneath us.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      03a285f5
    • P
      [PATCH] cpuset: mark number_of_cpusets read_mostly · 7edc5962
      Paul Jackson 提交于
      Mark cpuset global 'number_of_cpusets' as __read_mostly.
      
      This global is accessed everytime a zone is considered in the zonelist loops
      beneath __alloc_pages, looking for a free memory page.  If number_of_cpusets
      is just one, then we can short circuit the mems_allowed check.
      
      Since this global is read alot on a hot path, and written rarely, it is an
      excellent candidate for __read_mostly.
      
      Thanks to Christoph Lameter for the suggestion.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7edc5962
    • P
      [PATCH] cpuset: use rcu directly optimization · 6b9c2603
      Paul Jackson 提交于
      Optimize the cpuset impact on page allocation, the most performance critical
      cpuset hook in the kernel.
      
      On each page allocation, the cpuset hook needs to check for a possible change
      in the current tasks cpuset.  It can now handle the common case, of no change,
      without taking any spinlock or semaphore, thanks to RCU.
      
      Convert a spinlock on the current task to an rcu_read_lock(), saving
      approximately a memory barrier and an atomic op, depending on architecture.
      
      This is done by adding rcu_assign_pointer() and synchronize_rcu() calls to the
      write side of the task->cpuset pointer, in cpuset.c:attach_task(), to delay
      freeing up a detached cpuset until after any critical sections referencing
      that pointer.
      
      Thanks to Andi Kleen, Nick Piggin and Eric Dumazet for ideas.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6b9c2603
    • P
      [PATCH] cpuset: remove test for null cpuset from alloc code path · c417f024
      Paul Jackson 提交于
      Remove a couple of more lines of code from the cpuset hooks in the page
      allocation code path.
      
      There was a check for a NULL cpuset pointer in the routine
      cpuset_update_task_memory_state() that was only needed during system boot,
      after the memory subsystem was initialized, before the cpuset subsystem was
      initialized, to catch a NULL task->cpuset pointer.
      
      Add a cpuset_init_early() routine, just before the mem_init() call in
      init/main.c, that sets up just enough of the init tasks cpuset structure to
      render cpuset_update_task_memory_state() calls harmless.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c417f024
    • P
      [PATCH] cpuset: migrate all tasks in cpuset at once · 04c19fa6
      Paul Jackson 提交于
      Given the mechanism in the previous patch to handle rebinding the per-vma
      mempolicies of all tasks in a cpuset that changes its memory placement, it is
      now easier to handle the page migration requirements of such tasks at the same
      time.
      
      The previous code didn't actually attempt to migrate the pages of the tasks in
      a cpuset whose memory placement changed until the next time each such task
      tried to allocate memory.  This was undesirable, as users invoking memory page
      migration exected to happen when the placement changed, not some unspecified
      time later when the task needed more memory.
      
      It is now trivial to handle the page migration at the same time as the per-vma
      rebinding is done.
      
      The routine cpuset.c:update_nodemask(), which handles changing a cpusets
      memory placement ('mems') now checks for the special case of being asked to
      write a placement that is the same as before.  It was harmless enough before
      to just recompute everything again, even though nothing had changed.  But page
      migration is a heavy weight operation - moving pages about.  So now it is
      worth avoiding that if asked to move a cpuset to its current location.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      04c19fa6
    • P
      [PATCH] cpuset: rebind vma mempolicies fix · 4225399a
      Paul Jackson 提交于
      Fix more of longstanding bug in cpuset/mempolicy interaction.
      
      NUMA mempolicies (mm/mempolicy.c) are constrained by the current tasks cpuset
      to just the Memory Nodes allowed by that cpuset.  The kernel maintains
      internal state for each mempolicy, tracking what nodes are used for the
      MPOL_INTERLEAVE, MPOL_BIND or MPOL_PREFERRED policies.
      
      When a tasks cpuset memory placement changes, whether because the cpuset
      changed, or because the task was attached to a different cpuset, then the
      tasks mempolicies have to be rebound to the new cpuset placement, so as to
      preserve the cpuset-relative numbering of the nodes in that policy.
      
      An earlier fix handled such mempolicy rebinding for mempolicies attached to a
      task.
      
      This fix rebinds mempolicies attached to vma's (address ranges in a tasks
      address space.) Due to the need to hold the task->mm->mmap_sem semaphore while
      updating vma's, the rebinding of vma mempolicies has to be done when the
      cpuset memory placement is changed, at which time mmap_sem can be safely
      acquired.  The tasks mempolicy is rebound later, when the task next attempts
      to allocate memory and notices that its task->cpuset_mems_generation is
      out-of-date with its cpusets mems_generation.
      
      Because walking the tasklist to find all tasks attached to a changing cpuset
      requires holding tasklist_lock, a spinlock, one cannot update the vma's of the
      affected tasks while doing the tasklist scan.  In general, one cannot acquire
      a semaphore (which can sleep) while already holding a spinlock (such as
      tasklist_lock).  So a list of mm references has to be built up during the
      tasklist scan, then the tasklist lock dropped, then for each mm, its mmap_sem
      acquired, and the vma's in that mm rebound.
      
      Once the tasklist lock is dropped, affected tasks may fork new tasks, before
      their mm's are rebound.  A kernel global 'cpuset_being_rebound' is set to
      point to the cpuset being rebound (there can only be one; cpuset modifications
      are done under a global 'manage_sem' semaphore), and the mpol_copy code that
      is used to copy a tasks mempolicies during fork catches such forking tasks,
      and ensures their children are also rebound.
      
      When a task is moved to a different cpuset, it is easier, as there is only one
      task involved.  It's mm->vma's are scanned, using the same
      mpol_rebind_policy() as used above.
      
      It may happen that both the mpol_copy hook and the update done via the
      tasklist scan update the same mm twice.  This is ok, as the mempolicies of
      each vma in an mm keep track of what mems_allowed they are relative to, and
      safely no-op a second request to rebind to the same nodes.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4225399a
    • P
      [PATCH] cpuset: number_of_cpusets optimization · 202f72d5
      Paul Jackson 提交于
      Easy little optimization hack to avoid actually having to call
      cpuset_zone_allowed() and check mems_allowed, in the main page allocation
      routine, __alloc_pages().  This saves several CPU cycles per page allocation
      on systems not using cpusets.
      
      A counter is updated each time a cpuset is created or removed, and whenever
      there is only one cpuset in the system, it must be the root cpuset, which
      contains all CPUs and all Memory Nodes.  In that case, when the counter is
      one, all allocations are allowed.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      202f72d5
    • P
      [PATCH] cpuset: numa_policy_rebind cleanup · 74cb2155
      Paul Jackson 提交于
      Cleanup, reorganize and make more robust the mempolicy.c code to rebind
      mempolicies relative to the containing cpuset after a tasks memory placement
      changes.
      
      The real motivator for this cleanup patch is to lay more groundwork for the
      upcoming patch to correctly rebind NUMA mempolicies that are attached to vma's
      after the containing cpuset memory placement changes.
      
      NUMA mempolicies are constrained by the cpuset their task is a member of.
      When either (1) a task is moved to a different cpuset, or (2) the 'mems'
      mems_allowed of a cpuset is changed, then the NUMA mempolicies have embedded
      node numbers (for MPOL_BIND, MPOL_INTERLEAVE and MPOL_PREFERRED) that need to
      be recalculated, relative to their new cpuset placement.
      
      The old code used an unreliable method of determining what was the old
      mems_allowed constraining the mempolicy.  It just looked at the tasks
      mems_allowed value.  This sort of worked with the present code, that just
      rebinds the -task- mempolicy, and leaves any -vma- mempolicies broken,
      referring to the old nodes.  But in an upcoming patch, the vma mempolicies
      will be rebound as well.  Then the order in which the various task and vma
      mempolicies are updated will no longer be deterministic, and one can no longer
      count on the task->mems_allowed holding the old value for as long as needed.
      It's not even clear if the current code was guaranteed to work reliably for
      task mempolicies.
      
      So I added a mems_allowed field to each mempolicy, stating exactly what
      mems_allowed the policy is relative to, and updated synchronously and reliably
      anytime that the mempolicy is rebound.
      
      Also removed a useless wrapper routine, numa_policy_rebind(), and had its
      caller, cpuset_update_task_memory_state(), call directly to the rewritten
      policy_rebind() routine, and made that rebind routine extern instead of
      static, and added a "mpol_" prefix to its name, making it
      mpol_rebind_policy().
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      74cb2155
    • P
      [PATCH] cpuset: implement cpuset_mems_allowed · 909d75a3
      Paul Jackson 提交于
      Provide a cpuset_mems_allowed() method, which the sys_migrate_pages() code
      needed, to obtain the mems_allowed vector of a cpuset, and replaced the
      workaround in sys_migrate_pages() to call this new method.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      909d75a3
    • P
      [PATCH] cpuset: combine refresh_mems and update_mems · cf2a473c
      Paul Jackson 提交于
      The important code paths through alloc_pages_current() and alloc_page_vma(),
      by which most kernel page allocations go, both called
      cpuset_update_current_mems_allowed(), which in turn called refresh_mems().
      -Both- of these latter two routines did a tasklock, got the tasks cpuset
      pointer, and checked for out of date cpuset->mems_generation.
      
      That was a silly duplication of code and waste of CPU cycles on an important
      code path.
      
      Consolidated those two routines into a single routine, called
      cpuset_update_task_memory_state(), since it updates more than just
      mems_allowed.
      
      Changed all callers of either routine to call the new consolidated routine.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      cf2a473c
    • P
      [PATCH] cpuset: fork hook fix · b4b26418
      Paul Jackson 提交于
      Fix obscure, never seen in real life, cpuset fork race.  The cpuset_fork()
      call in fork.c was setting up the correct task->cpuset pointer after the
      tasklist_lock was dropped, which briefly exposed the newly forked process with
      an unsafe (copied from parent without locks or usage counter increment) cpuset
      pointer.
      
      In theory, that exposed cpuset pointer could have been pointing at a cpuset
      that was already freed and removed, and in theory another task that had been
      sitting on the tasklist_lock waiting to scan the task list could have raced
      down the entire tasklist, found our new child at the far end, and dereferenced
      that bogus cpuset pointer.
      
      To fix, setup up the correct cpuset pointer in the new child by calling
      cpuset_fork() before the new task is linked into the tasklist, and with that,
      add a fork failure case, to dereference that cpuset, if the fork fails along
      the way, after cpuset_fork() was called.
      
      Had to remove a BUG_ON() from cpuset_exit(), because it was no longer valid -
      the call to cpuset_exit() from a failed fork would not have PF_EXITING set.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b4b26418
    • P
      [PATCH] cpuset: update_nodemask code reformat · 59dac16f
      Paul Jackson 提交于
      Restructure code layout of the kernel/cpuset.c update_nodemask() routine,
      removing embedded returns and nested if's in favor of goto completion labels.
      This is being done in anticipation of adding more logic to this routine, which
      will favor the goto style structure.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      59dac16f
    • P
      [PATCH] cpuset: minor spacing initializer fixes · c5b2aff8
      Paul Jackson 提交于
      Four trivial cpuset fixes: remove extra spaces, remove useless initializers,
      mark one __read_mostly.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c5b2aff8
    • P
      [PATCH] cpuset: memory pressure meter · 3e0d98b9
      Paul Jackson 提交于
      Provide a simple per-cpuset metric of memory pressure, tracking the -rate-
      that the tasks in a cpuset call try_to_free_pages(), the synchronous
      (direct) memory reclaim code.
      
      This enables batch managers monitoring jobs running in dedicated cpusets to
      efficiently detect what level of memory pressure that job is causing.
      
      This is useful both on tightly managed systems running a wide mix of
      submitted jobs, which may choose to terminate or reprioritize jobs that are
      trying to use more memory than allowed on the nodes assigned them, and with
      tightly coupled, long running, massively parallel scientific computing jobs
      that will dramatically fail to meet required performance goals if they
      start to use more memory than allowed to them.
      
      This patch just provides a very economical way for the batch manager to
      monitor a cpuset for signs of memory pressure.  It's up to the batch
      manager or other user code to decide what to do about it and take action.
      
      ==> Unless this feature is enabled by writing "1" to the special file
          /dev/cpuset/memory_pressure_enabled, the hook in the rebalance
          code of __alloc_pages() for this metric reduces to simply noticing
          that the cpuset_memory_pressure_enabled flag is zero.  So only
          systems that enable this feature will compute the metric.
      
      Why a per-cpuset, running average:
      
          Because this meter is per-cpuset, rather than per-task or mm, the
          system load imposed by a batch scheduler monitoring this metric is
          sharply reduced on large systems, because a scan of the tasklist can be
          avoided on each set of queries.
      
          Because this meter is a running average, instead of an accumulating
          counter, a batch scheduler can detect memory pressure with a single
          read, instead of having to read and accumulate results for a period of
          time.
      
          Because this meter is per-cpuset rather than per-task or mm, the
          batch scheduler can obtain the key information, memory pressure in a
          cpuset, with a single read, rather than having to query and accumulate
          results over all the (dynamically changing) set of tasks in the cpuset.
      
      A per-cpuset simple digital filter (requires a spinlock and 3 words of data
      per-cpuset) is kept, and updated by any task attached to that cpuset, if it
      enters the synchronous (direct) page reclaim code.
      
      A per-cpuset file provides an integer number representing the recent
      (half-life of 10 seconds) rate of direct page reclaims caused by the tasks
      in the cpuset, in units of reclaims attempted per second, times 1000.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3e0d98b9
    • P
      [PATCH] cpuset: mempolicy one more nodemask conversion · 5966514d
      Paul Jackson 提交于
      Finish converting mm/mempolicy.c from bitmaps to nodemasks.  The previous
      conversion had left one routine using bitmaps, since it involved a
      corresponding change to kernel/cpuset.c
      
      Fix that interface by replacing with a simple macro that calls nodes_subset(),
      or if !CONFIG_CPUSET, returns (1).
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Cc: Christoph Lameter <christoph@lameter.com>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5966514d
    • P
      [PATCH] Simpler signal-exit concurrency handling · 2d89c929
      Paul E. McKenney 提交于
      Some simplification in checking signal delivery against concurrent exit.
      Instead of using get_task_struct_rcu(), which increments the task_struct
      reference count, check the reference count after acquiring sighand lock.
      Signed-off-by: N"Paul E. McKenney" <paulmck@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2d89c929
    • I
      [PATCH] RCU signal handling · e56d0903
      Ingo Molnar 提交于
      RCU tasklist_lock and RCU signal handling: send signals RCU-read-locked
      instead of tasklist_lock read-locked.  This is a scalability improvement on
      SMP and a preemption-latency improvement under PREEMPT_RCU.
      Signed-off-by: NPaul E. McKenney <paulmck@us.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NWilliam Irwin <wli@holomorphy.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e56d0903
    • R
      [PATCH] Change maxaligned_in_smp alignemnt macros to internodealigned_in_smp macros · 22fc6ecc
      Ravikiran G Thirumalai 提交于
      ____cacheline_maxaligned_in_smp is currently used to align critical structures
      and avoid false sharing.  It uses per-arch L1_CACHE_SHIFT_MAX and people find
      L1_CACHE_SHIFT_MAX useless.
      
      However, we have been using ____cacheline_maxaligned_in_smp to align
      structures on the internode cacheline size.  As per Andi's suggestion,
      following patch kills ____cacheline_maxaligned_in_smp and introduces
      INTERNODE_CACHE_SHIFT, which defaults to L1_CACHE_SHIFT for all arches.
      Arches needing L3/Internode cacheline alignment can define
      INTERNODE_CACHE_SHIFT in the arch asm/cache.h.  Patch replaces
      ____cacheline_maxaligned_in_smp with ____cacheline_internodealigned_in_smp
      
      With this patch, L1_CACHE_SHIFT_MAX can be killed
      Signed-off-by: NRavikiran Thirumalai <kiran@scalex86.org>
      Signed-off-by: NShai Fultheim <shai@scalex86.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      22fc6ecc
    • P
      [PATCH] cpusets: swap migration interface · 45b07ef3
      Paul Jackson 提交于
      Add a boolean "memory_migrate" to each cpuset, represented by a file
      containing "0" or "1" in each directory below /dev/cpuset.
      
      It defaults to false (file contains "0").  It can be set true by writing
      "1" to the file.
      
      If true, then anytime that a task is attached to the cpuset so marked, the
      pages of that task will be moved to that cpuset, preserving, to the extent
      practical, the cpuset-relative placement of the pages.
      
      Also anytime that a cpuset so marked has its memory placement changed (by
      writing to its "mems" file), the tasks in that cpuset will have their pages
      moved to the cpusets new nodes, preserving, to the extent practical, the
      cpuset-relative placement of the moved pages.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Cc: Christoph Lameter <christoph@lameter.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      45b07ef3
    • C
      [PATCH] Swap Migration V5: sys_migrate_pages interface · 39743889
      Christoph Lameter 提交于
      sys_migrate_pages implementation using swap based page migration
      
      This is the original API proposed by Ray Bryant in his posts during the first
      half of 2005 on linux-mm@kvack.org and linux-kernel@vger.kernel.org.
      
      The intent of sys_migrate is to migrate memory of a process.  A process may
      have migrated to another node.  Memory was allocated optimally for the prior
      context.  sys_migrate_pages allows to shift the memory to the new node.
      
      sys_migrate_pages is also useful if the processes available memory nodes have
      changed through cpuset operations to manually move the processes memory.  Paul
      Jackson is working on an automated mechanism that will allow an automatic
      migration if the cpuset of a process is changed.  However, a user may decide
      to manually control the migration.
      
      This implementation is put into the policy layer since it uses concepts and
      functions that are also needed for mbind and friends.  The patch also provides
      a do_migrate_pages function that may be useful for cpusets to automatically
      move memory.  sys_migrate_pages does not modify policies in contrast to Ray's
      implementation.
      
      The current code here is based on the swap based page migration capability and
      thus is not able to preserve the physical layout relative to it containing
      nodeset (which may be a cpuset).  When direct page migration becomes available
      then the implementation needs to be changed to do a isomorphic move of pages
      between different nodesets.  The current implementation simply evicts all
      pages in source nodeset that are not in the target nodeset.
      
      Patch supports ia64, i386 and x86_64.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      39743889
    • C
      [PATCH] add schedule_on_each_cpu() · 15316ba8
      Christoph Lameter 提交于
      swap migration's isolate_lru_page() currently uses an IPI to notify other
      processors that the lru caches need to be drained if the page cannot be
      found on the LRU.  The IPI interrupt may interrupt a processor that is just
      processing lru requests and cause a race condition.
      
      This patch introduces a new function run_on_each_cpu() that uses the
      keventd() to run the LRU draining on each processor.  Processors disable
      preemption when dealing the LRU caches (these are per processor) and thus
      executing LRU draining from another process is safe.
      
      Thanks to Lee Schermerhorn <lee.schermerhorn@hp.com> for finding this race
      condition.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      15316ba8
    • R
      [PATCH] Make high and batch sizes of per_cpu_pagelists configurable · 8ad4b1fb
      Rohit Seth 提交于
      As recently there has been lot of traffic on the right values for batch and
      high water marks for per_cpu_pagelists.  This patch makes these two
      variables configurable through /proc interface.
      
      A new tunable /proc/sys/vm/percpu_pagelist_fraction is added.  This entry
      controls the fraction of pages at most in each zone that are allocated for
      each per cpu page list.  The min value for this is 8.  It means that we
      don't allow more than 1/8th of pages in each zone to be allocated in any
      single per_cpu_pagelist.
      
      The batch value of each per cpu pagelist is also updated as a result.  It
      is set to pcp->high/4.  The upper limit of batch is (PAGE_SHIFT * 8)
      Signed-off-by: NRohit Seth <rohit.seth@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8ad4b1fb
    • A
      [PATCH] drop-pagecache · 9d0243bc
      Andrew Morton 提交于
      Add /proc/sys/vm/drop_caches.  When written to, this will cause the kernel to
      discard as much pagecache and/or reclaimable slab objects as it can.  THis
      operation requires root permissions.
      
      It won't drop dirty data, so the user should run `sync' first.
      
      Caveats:
      
      a) Holds inode_lock for exorbitant amounts of time.
      
      b) Needs to be taught about NUMA nodes: propagate these all the way through
         so the discarding can be controlled on a per-node basis.
      
      This is a debugging feature: useful for getting consistent results between
      filesystem benchmarks.  We could possibly put it under a config option, but
      it's less than 300 bytes.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9d0243bc
  2. 07 1月, 2006 16 次提交