1. 13 9月, 2005 1 次提交
  2. 12 9月, 2005 1 次提交
    • G
      [PATCH] uclinux: add NULL check, 0 end valid check and some more exports to nommu.c · 66aa2b4b
      Greg Ungerer 提交于
      Move call to get_mm_counter() in update_mem_hiwater() to be
      inside the check for tsk->mm being null. Otherwise you can be
      following a null pointer here. This patch submitted by
      Javier Herrero <jherrero@hvsistemas.es>.
      
      Modify the end check for munmap regions to allow for the
      legacy behavior of 0 being valid. Pretty much all current
      uClinux system libc malloc's pass in 0 as the end point.
      A hard check will fail on these, so change the check so
      that if it is non-zero it must be valid otherwise it fails.
      A passed in value will always succeed (as it used too).
      
      Also export a few more mm system functions - to be consistent
      with the VM code exports.
      Signed-off-by: NGreg Ungerer <gerg@uclinux.com>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      66aa2b4b
  3. 11 9月, 2005 5 次提交
  4. 10 9月, 2005 5 次提交
    • I
      [PATCH] timer initialization cleanup: DEFINE_TIMER · 8d06afab
      Ingo Molnar 提交于
      Clean up timer initialization by introducing DEFINE_TIMER a'la
      DEFINE_SPINLOCK.  Build and boot-tested on x86.  A similar patch has been
      been in the -RT tree for some time.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8d06afab
    • P
      [PATCH] update kfree, vfree, and vunmap kerneldoc · 80e93eff
      Pekka Enberg 提交于
      This patch clarifies NULL handling of kfree() and vfree().  I addition,
      wording of calling context restriction for vfree() and vunmap() are changed
      from "may not" to "must not."
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Acked-by: NManfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      80e93eff
    • C
      [PATCH] Numa-aware slab allocator V5 · e498be7d
      Christoph Lameter 提交于
      The NUMA API change that introduced kmalloc_node was accepted for
      2.6.12-rc3.  Now it is possible to do slab allocations on a node to
      localize memory structures.  This API was used by the pageset localization
      patch and the block layer localization patch now in mm.  The existing
      kmalloc_node is slow since it simply searches through all pages of the slab
      to find a page that is on the node requested.  The two patches do a one
      time allocation of slab structures at initialization and therefore the
      speed of kmalloc node does not matter.
      
      This patch allows kmalloc_node to be as fast as kmalloc by introducing node
      specific page lists for partial, free and full slabs.  Slab allocation
      improves in a NUMA system so that we are seeing a performance gain in AIM7
      of about 5% with this patch alone.
      
      More NUMA localizations are possible if kmalloc_node operates in an fast
      way like kmalloc.
      
      Test run on a 32p systems with 32G Ram.
      
      w/o patch
      Tasks    jobs/min  jti  jobs/min/task      real       cpu
          1      485.36  100       485.3640     11.99      1.91   Sat Apr 30 14:01:51 2005
        100    26582.63   88       265.8263     21.89    144.96   Sat Apr 30 14:02:14 2005
        200    29866.83   81       149.3342     38.97    286.08   Sat Apr 30 14:02:53 2005
        300    33127.16   78       110.4239     52.71    426.54   Sat Apr 30 14:03:46 2005
        400    34889.47   80        87.2237     66.72    568.90   Sat Apr 30 14:04:53 2005
        500    35654.34   76        71.3087     81.62    714.55   Sat Apr 30 14:06:15 2005
        600    36460.83   75        60.7681     95.77    853.42   Sat Apr 30 14:07:51 2005
        700    35957.00   75        51.3671    113.30    990.67   Sat Apr 30 14:09:45 2005
        800    33380.65   73        41.7258    139.48   1140.86   Sat Apr 30 14:12:05 2005
        900    35095.01   76        38.9945    149.25   1281.30   Sat Apr 30 14:14:35 2005
       1000    36094.37   74        36.0944    161.24   1419.66   Sat Apr 30 14:17:17 2005
      
      w/patch
      Tasks    jobs/min  jti  jobs/min/task      real       cpu
          1      484.27  100       484.2736     12.02      1.93   Sat Apr 30 15:59:45 2005
        100    28262.03   90       282.6203     20.59    143.57   Sat Apr 30 16:00:06 2005
        200    32246.45   82       161.2322     36.10    282.89   Sat Apr 30 16:00:42 2005
        300    37945.80   83       126.4860     46.01    418.75   Sat Apr 30 16:01:28 2005
        400    40000.69   81       100.0017     58.20    561.48   Sat Apr 30 16:02:27 2005
        500    40976.10   78        81.9522     71.02    696.95   Sat Apr 30 16:03:38 2005
        600    41121.54   78        68.5359     84.92    834.86   Sat Apr 30 16:05:04 2005
        700    44052.77   78        62.9325     92.48    971.53   Sat Apr 30 16:06:37 2005
        800    41066.89   79        51.3336    113.38   1111.15   Sat Apr 30 16:08:31 2005
        900    38918.77   79        43.2431    134.59   1252.57   Sat Apr 30 16:10:46 2005
       1000    41842.21   76        41.8422    139.09   1392.33   Sat Apr 30 16:13:05 2005
      
      These are measurement taken directly after boot and show a greater
      improvement than 5%.  However, the performance improvements become less
      over time if the AIM7 runs are repeated and settle down at around 5%.
      
      Links to earlier discussions:
      http://marc.theaimsgroup.com/?t=111094594500003&r=1&w=2
      http://marc.theaimsgroup.com/?t=111603406600002&r=1&w=2
      
      Changelog V4-V5:
      - alloc_arraycache and alloc_aliencache take node parameter instead of cpu
      - fix initialization so that nodes without cpus are properly handled.
      - simplify code in kmem_cache_init
      - patch against Andrews temp mm3 release
      - Add Shai to credits
      - fallback to __cache_alloc from __cache_alloc_node if the node's cache
        is not available yet.
      
      Changelog V3-V4:
      - Patch against 2.6.12-rc5-mm1
      - Cleanup patch integrated
      - More and better use of for_each_node and for_each_cpu
      - GCC 2.95 fix (do not use [] use [0])
      - Correct determination of INDEX_AC
      - Remove hack to cause an error on platforms that have no CONFIG_NUMA but nodes.
      - Remove list3_data and list3_data_ptr macros for better readability
      
      Changelog V2-V3:
      - Made to patch against 2.6.12-rc4-mm1
      - Revised bootstrap mechanism so that larger size kmem_list3 structs can be
        supported. Do a generic solution so that the right slab can be found
        for the internal structs.
      - use for_each_online_node
      
      Changelog V1-V2:
      - Batching for freeing of wrong-node objects (alien caches)
      - Locking changes and NUMA #ifdefs as requested by Manfred
      Signed-off-by: NAlok N Kataria <alokk@calsoftinc.com>
      Signed-off-by: NShobhit Dayal <shobhit@calsoftinc.com>
      Signed-off-by: NShai Fultheim <Shai@Scalex86.org>
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e498be7d
    • S
      [PATCH] tmpfs: Enable atomic inode security labeling · 570bc1c2
      Stephen Smalley 提交于
      This patch modifies tmpfs to call the inode_init_security LSM hook to set
      up the incore inode security state for new inodes before the inode becomes
      accessible via the dcache.
      
      As there is no underlying storage of security xattrs in this case, it is
      not necessary for the hook to return the (name, value, len) triple to the
      tmpfs code, so this patch also modifies the SELinux hook function to
      correctly handle the case where the (name, value, len) pointers are NULL.
      
      The hook call is needed in tmpfs in order to support proper security
      labeling of tmpfs inodes (e.g.  for udev with tmpfs /dev in Fedora).  With
      this change in place, we should then be able to remove the
      security_inode_post_create/mkdir/...  hooks safely.
      Signed-off-by: NStephen Smalley <sds@tycho.nsa.gov>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      570bc1c2
    • M
      [PATCH] update filesystems for new delete_inode behavior · fef26658
      Mark Fasheh 提交于
      Update the file systems in fs/ implementing a delete_inode() callback to
      call truncate_inode_pages().  One implementation note: In developing this
      patch I put the calls to truncate_inode_pages() at the very top of those
      filesystems delete_inode() callbacks in order to retain the previous
      behavior.  I'm guessing that some of those could probably be optimized.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      Acked-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fef26658
  5. 09 9月, 2005 1 次提交
  6. 08 9月, 2005 8 次提交
    • P
      [PATCH] introduce and use kzalloc · dd392710
      Pekka J Enberg 提交于
      This patch introduces a kzalloc wrapper and converts kernel/ to use it.  It
      saves a little program text.
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      dd392710
    • P
      [PATCH] cpusets: confine oom_killer to mem_exclusive cpuset · ef08e3b4
      Paul Jackson 提交于
      Now the real motivation for this cpuset mem_exclusive patch series seems
      trivial.
      
      This patch keeps a task in or under one mem_exclusive cpuset from provoking an
      oom kill of a task under a non-overlapping mem_exclusive cpuset.  Since only
      interrupt and GFP_ATOMIC allocations are allowed to escape mem_exclusive
      containment, there is little to gain from oom killing a task under a
      non-overlapping mem_exclusive cpuset, as almost all kernel and user memory
      allocation must come from disjoint memory nodes.
      
      This patch enables configuring a system so that a runaway job under one
      mem_exclusive cpuset cannot cause the killing of a job in another such cpuset
      that might be using very high compute and memory resources for a prolonged
      time.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ef08e3b4
    • P
      [PATCH] cpusets: formalize intermediate GFP_KERNEL containment · 9bf2229f
      Paul Jackson 提交于
      This patch makes use of the previously underutilized cpuset flag
      'mem_exclusive' to provide what amounts to another layer of memory placement
      resolution.  With this patch, there are now the following four layers of
      memory placement available:
      
       1) The whole system (interrupt and GFP_ATOMIC allocations can use this),
       2) The nearest enclosing mem_exclusive cpuset (GFP_KERNEL allocations can use),
       3) The current tasks cpuset (GFP_USER allocations constrained to here), and
       4) Specific node placement, using mbind and set_mempolicy.
      
      These nest - each layer is a subset (same or within) of the previous.
      
      Layer (2) above is new, with this patch.  The call used to check whether a
      zone (its node, actually) is in a cpuset (in its mems_allowed, actually) is
      extended to take a gfp_mask argument, and its logic is extended, in the case
      that __GFP_HARDWALL is not set in the flag bits, to look up the cpuset
      hierarchy for the nearest enclosing mem_exclusive cpuset, to determine if
      placement is allowed.  The definition of GFP_USER, which used to be identical
      to GFP_KERNEL, is changed to also set the __GFP_HARDWALL bit, in the previous
      cpuset_gfp_hardwall_flag patch.
      
      GFP_ATOMIC and GFP_KERNEL allocations will stay within the current tasks
      cpuset, so long as any node therein is not too tight on memory, but will
      escape to the larger layer, if need be.
      
      The intended use is to allow something like a batch manager to handle several
      jobs, each job in its own cpuset, but using common kernel memory for caches
      and such.  Swapper and oom_kill activity is also constrained to Layer (2).  A
      task in or below one mem_exclusive cpuset should not cause swapping on nodes
      in another non-overlapping mem_exclusive cpuset, nor provoke oom_killing of a
      task in another such cpuset.  Heavy use of kernel memory for i/o caching and
      such by one job should not impact the memory available to jobs in other
      non-overlapping mem_exclusive cpusets.
      
      This patch enables providing hardwall, inescapable cpusets for memory
      allocations of each job, while sharing kernel memory allocations between
      several jobs, in an enclosing mem_exclusive cpuset.
      
      Like Dinakar's patch earlier to enable administering sched domains using the
      cpu_exclusive flag, this patch also provides a useful meaning to a cpuset flag
      that had previously done nothing much useful other than restrict what cpuset
      configurations were allowed.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9bf2229f
    • P
      [PATCH] cpusets: oom_kill tweaks · a49335cc
      Paul Jackson 提交于
      This patch series extends the use of the cpuset attribute 'mem_exclusive'
      to support cpuset configurations that:
       1) allow GFP_KERNEL allocations to come from a potentially larger
          set of memory nodes than GFP_USER allocations, and
       2) can constrain the oom killer to tasks running in cpusets in
          a specified subtree of the cpuset hierarchy.
      
      Here's an example usage scenario.  For a few hours or more, a large NUMA
      system at a University is to be divided in two halves, with a bunch of student
      jobs running in half the system under some form of batch manager, and with a
      big research project running in the other half.  Each of the student jobs is
      placed in a small cpuset, but should share the classic Unix time share
      facilities, such as buffered pages of files in /bin and /usr/lib.  The big
      research project wants no interference whatsoever from the student jobs, and
      has highly tuned, unusual memory and i/o patterns that intend to make full use
      of all the main memory on the nodes available to it.
      
      In this example, we have two big sibling cpusets, one of which is further
      divided into a more dynamic set of child cpusets.
      
      We want kernel memory allocations constrained by the two big cpusets, and user
      allocations constrained by the smaller child cpusets where present.  And we
      require that the oom killer not operate across the two halves of this system,
      or else the first time a student job runs amuck, the big research project will
      likely be first inline to get shot.
      
      Tweaking /proc/<pid>/oom_adj is not ideal -- if the big research project
      really does run amuck allocating memory, it should be shot, not some other
      task outside the research projects mem_exclusive cpuset.
      
      I propose to extend the use of the 'mem_exclusive' flag of cpusets to manage
      such scenarios.  Let memory allocations for user space (GFP_USER) be
      constrained by a tasks current cpuset, but memory allocations for kernel space
      (GFP_KERNEL) by constrained by the nearest mem_exclusive ancestor of the
      current cpuset, even though kernel space allocations will still _prefer_ to
      remain within the current tasks cpuset, if memory is easily available.
      
      Let the oom killer be constrained to consider only tasks that are in
      overlapping mem_exclusive cpusets (it won't help much to kill a task that
      normally cannot allocate memory on any of the same nodes as the ones on which
      the current task can allocate.)
      
      The current constraints imposed on setting mem_exclusive are unchanged.  A
      cpuset may only be mem_exclusive if its parent is also mem_exclusive, and a
      mem_exclusive cpuset may not overlap any of its siblings memory nodes.
      
      This patch was presented on linux-mm in early July 2005, though did not
      generate much feedback at that time.  It has been built for a variety of
      arch's using cross tools, and built, booted and tested for function on SN2
      (ia64).
      
      There are 4 patches in this set:
        1) Some minor cleanup, and some improvements to the code layout
           of one routine to make subsequent patches cleaner.
        2) Add another GFP flag - __GFP_HARDWALL.  It marks memory
           requests for USER space, which are tightly confined by the
           current tasks cpuset.
        3) Now memory requests (such as KERNEL) that not marked HARDWALL can
           if short on memory, look in the potentially larger pool of memory
           defined by the nearest mem_exclusive ancestor cpuset of the current
           tasks cpuset.
        4) Finally, modify the oom killer to skip any task whose mem_exclusive
           cpuset doesn't overlap ours.
      
      Patch (1), the one time I looked on an SN2 (ia64) build, actually saved 32
      bytes of kernel text space.  Patch (2) has no affect on the size of kernel
      text space (it just adds a preprocessor flag).  Patches (3) and (4) added
      about 600 bytes each of kernel text space, mostly in kernel/cpuset.c, which
      matters only if CONFIG_CPUSET is enabled.
      
      This patch:
      
      This patch applies a few comment and code cleanups to mm/oom_kill.c prior to
      applying a few small patches to improve cpuset management of memory placement.
      
      The comment changed in oom_kill.c was seriously misleading.  The code layout
      change in select_bad_process() makes room for adding another condition on
      which a process can be spared the oom killer (see the subsequent
      cpuset_nodes_overlap patch for this addition).
      
      Also a couple typos and spellos that bugged me, while I was here.
      
      This patch should have no material affect.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a49335cc
    • R
      [PATCH] Additions to .data.read_mostly section · 6c231b7b
      Ravikiran G Thirumalai 提交于
      Mark variables which are usually accessed for reads with __readmostly.
      Signed-off-by: NAlok N Kataria <alokk@calsoftinc.com>
      Signed-off-by: NShai Fultheim <shai@scalex86.org>
      Signed-off-by: NRavikiran Thirumalai <kiran@scalex86.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6c231b7b
    • S
      [PATCH] readahead: reset cache_hit earlier · 3b30bbd9
      Steven Pratt 提交于
      We don't reset the cache hit count until after readahead does a successful
      readahead.  This seems to leave a corner case open where we miss in cache,
      but don't restart the readhead right away.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3b30bbd9
    • C
      cdb3826b
    • C
      [PATCH] More __read_mostly variables · c3d8c141
      Christoph Lameter 提交于
      Move some more frequently read variables that showed up during some of our
      performance tests as sometimes ending up in hot cachelines to the
      read_mostly section.
      
      Fix: Move the __read_mostly from before hpet_usec_quotient to follow the
      variable like the other uses of __read_mostly.
      Signed-off-by: NAlok N Kataria <alokk@calsoftinc.com>
      Signed-off-by: NChristoph Lameter <christoph@scalex86.org>
      Signed-off-by: NShai Fultheim <shai@scalex86.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c3d8c141
  7. 05 9月, 2005 19 次提交