1. 08 5月, 2007 1 次提交
    • C
      SLUB core · 81819f0f
      Christoph Lameter 提交于
      This is a new slab allocator which was motivated by the complexity of the
      existing code in mm/slab.c. It attempts to address a variety of concerns
      with the existing implementation.
      
      A. Management of object queues
      
         A particular concern was the complex management of the numerous object
         queues in SLAB. SLUB has no such queues. Instead we dedicate a slab for
         each allocating CPU and use objects from a slab directly instead of
         queueing them up.
      
      B. Storage overhead of object queues
      
         SLAB Object queues exist per node, per CPU. The alien cache queue even
         has a queue array that contain a queue for each processor on each
         node. For very large systems the number of queues and the number of
         objects that may be caught in those queues grows exponentially. On our
         systems with 1k nodes / processors we have several gigabytes just tied up
         for storing references to objects for those queues  This does not include
         the objects that could be on those queues. One fears that the whole
         memory of the machine could one day be consumed by those queues.
      
      C. SLAB meta data overhead
      
         SLAB has overhead at the beginning of each slab. This means that data
         cannot be naturally aligned at the beginning of a slab block. SLUB keeps
         all meta data in the corresponding page_struct. Objects can be naturally
         aligned in the slab. F.e. a 128 byte object will be aligned at 128 byte
         boundaries and can fit tightly into a 4k page with no bytes left over.
         SLAB cannot do this.
      
      D. SLAB has a complex cache reaper
      
         SLUB does not need a cache reaper for UP systems. On SMP systems
         the per CPU slab may be pushed back into partial list but that
         operation is simple and does not require an iteration over a list
         of objects. SLAB expires per CPU, shared and alien object queues
         during cache reaping which may cause strange hold offs.
      
      E. SLAB has complex NUMA policy layer support
      
         SLUB pushes NUMA policy handling into the page allocator. This means that
         allocation is coarser (SLUB does interleave on a page level) but that
         situation was also present before 2.6.13. SLABs application of
         policies to individual slab objects allocated in SLAB is
         certainly a performance concern due to the frequent references to
         memory policies which may lead a sequence of objects to come from
         one node after another. SLUB will get a slab full of objects
         from one node and then will switch to the next.
      
      F. Reduction of the size of partial slab lists
      
         SLAB has per node partial lists. This means that over time a large
         number of partial slabs may accumulate on those lists. These can
         only be reused if allocator occur on specific nodes. SLUB has a global
         pool of partial slabs and will consume slabs from that pool to
         decrease fragmentation.
      
      G. Tunables
      
         SLAB has sophisticated tuning abilities for each slab cache. One can
         manipulate the queue sizes in detail. However, filling the queues still
         requires the uses of the spin lock to check out slabs. SLUB has a global
         parameter (min_slab_order) for tuning. Increasing the minimum slab
         order can decrease the locking overhead. The bigger the slab order the
         less motions of pages between per CPU and partial lists occur and the
         better SLUB will be scaling.
      
      G. Slab merging
      
         We often have slab caches with similar parameters. SLUB detects those
         on boot up and merges them into the corresponding general caches. This
         leads to more effective memory use. About 50% of all caches can
         be eliminated through slab merging. This will also decrease
         slab fragmentation because partial allocated slabs can be filled
         up again. Slab merging can be switched off by specifying
         slub_nomerge on boot up.
      
         Note that merging can expose heretofore unknown bugs in the kernel
         because corrupted objects may now be placed differently and corrupt
         differing neighboring objects. Enable sanity checks to find those.
      
      H. Diagnostics
      
         The current slab diagnostics are difficult to use and require a
         recompilation of the kernel. SLUB contains debugging code that
         is always available (but is kept out of the hot code paths).
         SLUB diagnostics can be enabled via the "slab_debug" option.
         Parameters can be specified to select a single or a group of
         slab caches for diagnostics. This means that the system is running
         with the usual performance and it is much more likely that
         race conditions can be reproduced.
      
      I. Resiliency
      
         If basic sanity checks are on then SLUB is capable of detecting
         common error conditions and recover as best as possible to allow the
         system to continue.
      
      J. Tracing
      
         Tracing can be enabled via the slab_debug=T,<slabcache> option
         during boot. SLUB will then protocol all actions on that slabcache
         and dump the object contents on free.
      
      K. On demand DMA cache creation.
      
         Generally DMA caches are not needed. If a kmalloc is used with
         __GFP_DMA then just create this single slabcache that is needed.
         For systems that have no ZONE_DMA requirement the support is
         completely eliminated.
      
      L. Performance increase
      
         Some benchmarks have shown speed improvements on kernbench in the
         range of 5-10%. The locking overhead of slub is based on the
         underlying base allocation size. If we can reliably allocate
         larger order pages then it is possible to increase slub
         performance much further. The anti-fragmentation patches may
         enable further performance increases.
      
      Tested on:
      i386 UP + SMP, x86_64 UP + SMP + NUMA emulation, IA64 NUMA + Simulator
      
      SLUB Boot options
      
      slub_nomerge		Disable merging of slabs
      slub_min_order=x	Require a minimum order for slab caches. This
      			increases the managed chunk size and therefore
      			reduces meta data and locking overhead.
      slub_min_objects=x	Mininum objects per slab. Default is 8.
      slub_max_order=x	Avoid generating slabs larger than order specified.
      slub_debug		Enable all diagnostics for all caches
      slub_debug=<options>	Enable selective options for all caches
      slub_debug=<o>,<cache>	Enable selective options for a certain set of
      			caches
      
      Available Debug options
      F		Double Free checking, sanity and resiliency
      R		Red zoning
      P		Object / padding poisoning
      U		Track last free / alloc
      T		Trace all allocs / frees (only use for individual slabs).
      
      To use SLUB: Apply this patch and then select SLUB as the default slab
      allocator.
      
      [hugh@veritas.com: fix an oops-causing locking error]
      [akpm@linux-foundation.org: various stupid cleanups and small fixes]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      81819f0f
  2. 03 5月, 2007 5 次提交
  3. 13 3月, 2007 1 次提交
    • Z
      [PATCH] Fix VMI and COMPAT_VDSO for 2.6.21 · b6bc5d71
      Zachary Amsden 提交于
      VMI is broken under COMPAT_VDSO, as Xen and other non hardware assisted
      hypervisors will be.  I have been working on a fix for this which works
      for older glibcs that panic when the new relocatable VDSO is used.
      
      However, I believe at this time that the fix is going to be too radical
      to consider at this stage in the release of 2.6.21.  We don't expect
      this config option to be turned on by vendors for new distributions, so
      at this point we are willing to drop support for it when VMI is compiled
      in, and work on a patch for 2.6.22 which more fully addresses the
      problem.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b6bc5d71
  4. 06 3月, 2007 4 次提交
  5. 05 3月, 2007 1 次提交
  6. 20 2月, 2007 1 次提交
  7. 17 2月, 2007 4 次提交
  8. 13 2月, 2007 2 次提交
    • Z
      [PATCH] i386: vMI timer patches · bbab4f3b
      Zachary Amsden 提交于
      VMI timer code.  It works by taking over the local APIC clock when APIC is
      configured, which requires a couple hooks into the APIC code.  The backend
      timer code could be commonized into the timer infrastructure, but there are
      some pieces missing (stolen time, in particular), and the exact semantics of
      when to do accounting for NO_IDLE need to be shared between different
      hypervisors as well.  So for now, VMI timer is a separate module.
      
      [Adrian Bunk: cleanups]
      
      Subject: VMI timer patches
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      bbab4f3b
    • Z
      [PATCH] i386: vMI backend for paravirt-ops · 7ce0bcfd
      Zachary Amsden 提交于
      Fairly straightforward implementation of VMI backend for paravirt-ops.
      
      [Adrian Bunk: some cleanups]
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      7ce0bcfd
  9. 12 2月, 2007 1 次提交
    • C
      [PATCH] Set CONFIG_ZONE_DMA for arches with GENERIC_ISA_DMA · 5ac6da66
      Christoph Lameter 提交于
      As Andi pointed out: CONFIG_GENERIC_ISA_DMA only disables the ISA DMA
      channel management.  Other functionality may still expect GFP_DMA to
      provide memory below 16M.  So we need to make sure that CONFIG_ZONE_DMA is
      set independent of CONFIG_GENERIC_ISA_DMA.  Undo the modifications to
      mm/Kconfig where we made ZONE_DMA dependent on GENERIC_ISA_DMA and set
      theses explicitly in each arches Kconfig.
      
      Reviews must occur for each arch in order to determine if ZONE_DMA can be
      switched off.  It can only be switched off if we know that all devices
      supported by a platform are capable of performing DMA transfers to all of
      memory (Some arches already support this: uml, avr32, sh sh64, parisc and
      IA64/Altix).
      
      In order to switch ZONE_DMA off conditionally, one would have to establish
      a scheme by which one can assure that no drivers are enabled that are only
      capable of doing I/O to a part of memory, or one needs to provide an
      alternate means of performing an allocation from a specific range of memory
      (like provided by alloc_pages_range()) and insure that all drivers use that
      call.  In that case the arches alloc_dma_coherent() may need to be modified
      to call alloc_pages_range() instead of relying on GFP_DMA.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5ac6da66
  10. 06 1月, 2007 1 次提交
    • V
      [PATCH] i386: Restore CONFIG_PHYSICAL_START option · dd0ec16f
      Vivek Goyal 提交于
      o Relocatable bzImage support had got rid of CONFIG_PHYSICAL_START option
        thinking that now this option is not required as people can build a
        second kernel as relocatable and load it anywhere. So need of compiling
        the kernel for a custom address was gone. But Magnus uses vmlinux images
        for second kernel in Xen environment and he wants to continue to use
        it.
      
      o Restoring the CONFIG_PHYSICAL_START option for the time being. I think
        down the line we can get rid of it.
      Signed-off-by: NVivek Goyal <vgoyal@in.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      dd0ec16f
  11. 10 12月, 2006 1 次提交
    • R
      [PATCH] x86-64: no paravirt for X86_VOYAGER or X86_VISWS · f0f32fcc
      Randy Dunlap 提交于
      Since Voyager and Visual WS already define ARCH_SETUP,
      it looks like PARAVIRT shouldn't be offered for them.
      
      In file included from arch/i386/kernel/setup.c:63:
      include/asm-i386/mach-visws/setup_arch.h:8:1: warning: "ARCH_SETUP" redefin=
      ed
      In file included from include/asm/msr.h:5,
                       from include/asm/processor.h:17,
                       from include/asm/thread_info.h:16,
                       from include/linux/thread_info.h:21,
                       from include/linux/preempt.h:9,
                       from include/linux/spinlock.h:49,
                       from include/linux/capability.h:45,
                       from include/linux/sched.h:46,
                       from arch/i386/kernel/setup.c:26:
      include/asm/paravirt.h:163:1: warning: this is the location of the previous=
       definition
      In file included from arch/i386/kernel/setup.c:63:
      include/asm-i386/mach-visws/setup_arch.h:8:1: warning: "ARCH_SETUP" redefin=
      ed
      In file included from include/asm/msr.h:5,
                       from include/asm/processor.h:17,
                       from include/asm/thread_info.h:16,
                       from include/linux/thread_info.h:21,
                       from include/linux/preempt.h:9,
                       from include/linux/spinlock.h:49,
                       from include/linux/capability.h:45,
                       from include/linux/sched.h:46,
                       from arch/i386/kernel/setup.c:26:
      include/asm/paravirt.h:163:1: warning: this is the location of the previous=
       definition
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      f0f32fcc
  12. 09 12月, 2006 1 次提交
    • J
      [PATCH] Generic BUG for i386 · 91768d6c
      Jeremy Fitzhardinge 提交于
      This makes i386 use the generic BUG machinery.  There are no functional
      changes from the old i386 implementation.
      
      The main advantage in using the generic BUG machinery for i386 is that the
      inlined overhead of BUG is just the ud2a instruction; the file+line(+function)
      information are no longer inlined into the instruction stream.  This reduces
      cache pollution, and makes disassembly work properly.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@goop.org>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Hugh Dickens <hugh@veritas.com>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      91768d6c
  13. 07 12月, 2006 6 次提交
  14. 04 10月, 2006 1 次提交
    • M
      Attack of "the the"s in arch · 4b3f686d
      Matt LaPlante 提交于
      The patch below corrects multiple occurances of "the the"
      typos across several files, both in source comments and KConfig files.
      There is no actual code changed, only text.  Note this only affects the /arch
      directory, and I believe I could find many more elsewhere. :)
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      4b3f686d
  15. 02 10月, 2006 1 次提交
    • A
      [PATCH] Kprobes: Make kprobe modules more portable · 3a872d89
      Ananth N Mavinakayanahalli 提交于
      In an effort to make kprobe modules more portable, here is a patch that:
      
      o Introduces the "symbol_name" field to struct kprobe.
        The symbol->address resolution now happens in the kernel in an
        architecture agnostic manner. 64-bit powerpc users no longer have
        to specify the ".symbols"
      o Introduces the "offset" field to struct kprobe to allow a user to
        specify an offset into a symbol.
      o The legacy mechanism of specifying the kprobe.addr is still supported.
        However, if both the kprobe.addr and kprobe.symbol_name are specified,
        probe registration fails with an -EINVAL.
      o The symbol resolution code uses kallsyms_lookup_name(). So
        CONFIG_KPROBES now depends on CONFIG_KALLSYMS
      o Apparantly kprobe modules were the only legitimate out-of-tree user of
        the kallsyms_lookup_name() EXPORT. Now that the symbol resolution
        happens in-kernel, remove the EXPORT as suggested by Christoph Hellwig
      o Modify tcp_probe.c that uses the kprobe interface so as to make it
        work on multiple platforms (in its earlier form, the code wouldn't
        work, say, on powerpc)
      Signed-off-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com>
      Signed-off-by: NPrasanna S Panchamukhi <prasanna@in.ibm.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3a872d89
  16. 27 9月, 2006 3 次提交
  17. 26 9月, 2006 5 次提交
  18. 28 8月, 2006 1 次提交