1. 01 2月, 2013 1 次提交
  2. 19 12月, 2012 3 次提交
    • G
      slub: slub-specific propagation changes · 107dab5c
      Glauber Costa 提交于
      SLUB allows us to tune a particular cache behavior with sysfs-based
      tunables.  When creating a new memcg cache copy, we'd like to preserve any
      tunables the parent cache already had.
      
      This can be done by tapping into the store attribute function provided by
      the allocator.  We of course don't need to mess with read-only fields.
      Since the attributes can have multiple types and are stored internally by
      sysfs, the best strategy is to issue a ->show() in the root cache, and
      then ->store() in the memcg cache.
      
      The drawback of that, is that sysfs can allocate up to a page in buffering
      for show(), that we are likely not to need, but also can't guarantee.  To
      avoid always allocating a page for that, we can update the caches at store
      time with the maximum attribute size ever stored to the root cache.  We
      will then get a buffer big enough to hold it.  The corolary to this, is
      that if no stores happened, nothing will be propagated.
      
      It can also happen that a root cache has its tunables updated during
      normal system operation.  In this case, we will propagate the change to
      all caches that are already active.
      
      [akpm@linux-foundation.org: tweak code to avoid __maybe_unused]
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Frederic Weisbecker <fweisbec@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: JoonSoo Kim <js1304@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Suleiman Souhlal <suleiman@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      107dab5c
    • G
      sl[au]b: allocate objects from memcg cache · d79923fa
      Glauber Costa 提交于
      We are able to match a cache allocation to a particular memcg.  If the
      task doesn't change groups during the allocation itself - a rare event,
      this will give us a good picture about who is the first group to touch a
      cache page.
      
      This patch uses the now available infrastructure by calling
      memcg_kmem_get_cache() before all the cache allocations.
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Frederic Weisbecker <fweisbec@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: JoonSoo Kim <js1304@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Suleiman Souhlal <suleiman@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d79923fa
    • G
      slab/slub: struct memcg_params · ba6c496e
      Glauber Costa 提交于
      For the kmem slab controller, we need to record some extra information in
      the kmem_cache structure.
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Signed-off-by: NSuleiman Souhlal <suleiman@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Frederic Weisbecker <fweisbec@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: JoonSoo Kim <js1304@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ba6c496e
  3. 14 6月, 2012 1 次提交
    • C
      mm, sl[aou]b: Extract common fields from struct kmem_cache · 3b0efdfa
      Christoph Lameter 提交于
      Define a struct that describes common fields used in all slab allocators.
      A slab allocator either uses the common definition (like SLOB) or is
      required to provide members of kmem_cache with the definition given.
      
      After that it will be possible to share code that
      only operates on those fields of kmem_cache.
      
      The patch basically takes the slob definition of kmem cache and
      uses the field namees for the other allocators.
      
      It also standardizes the names used for basic object lengths in
      allocators:
      
      object_size	Struct size specified at kmem_cache_create. Basically
      		the payload expected to be used by the subsystem.
      
      size		The size of memory allocator for each object. This size
      		is larger than object_size and includes padding, alignment
      		and extra metadata for each object (f.e. for debugging
      		and rcu).
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      3b0efdfa
  4. 01 6月, 2012 1 次提交
    • C
      slub: Get rid of the node field · ec3ab083
      Christoph Lameter 提交于
      The node field is always page_to_nid(c->page). So its rather easy to
      replace. Note that there maybe slightly more overhead in various hot paths
      due to the need to shift the bits from page->flags. However, that is mostly
      compensated for by a smaller footprint of the kmem_cache_cpu structure (this
      patch reduces that to 3 words per cache) which allows better caching.
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      ec3ab083
  5. 05 3月, 2012 1 次提交
    • P
      BUG: headers with BUG/BUG_ON etc. need linux/bug.h · 187f1882
      Paul Gortmaker 提交于
      If a header file is making use of BUG, BUG_ON, BUILD_BUG_ON, or any
      other BUG variant in a static inline (i.e. not in a #define) then
      that header really should be including <linux/bug.h> and not just
      expecting it to be implicitly present.
      
      We can make this change risk-free, since if the files using these
      headers didn't have exposure to linux/bug.h already, they would have
      been causing compile failures/warnings.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      187f1882
  6. 18 2月, 2012 1 次提交
  7. 28 9月, 2011 1 次提交
  8. 20 8月, 2011 1 次提交
    • C
      slub: per cpu cache for partial pages · 49e22585
      Christoph Lameter 提交于
      Allow filling out the rest of the kmem_cache_cpu cacheline with pointers to
      partial pages. The partial page list is used in slab_free() to avoid
      per node lock taking.
      
      In __slab_alloc() we can then take multiple partial pages off the per
      node partial list in one go reducing node lock pressure.
      
      We can also use the per cpu partial list in slab_alloc() to avoid scanning
      partial lists for pages with free objects.
      
      The main effect of a per cpu partial list is that the per node list_lock
      is taken for batches of partial pages instead of individual ones.
      
      Potential future enhancements:
      
      1. The pickup from the partial list could be perhaps be done without disabling
         interrupts with some work. The free path already puts the page into the
         per cpu partial list without disabling interrupts.
      
      2. __slab_free() may have some code paths that could use optimization.
      
      Performance:
      
      				Before		After
      ./hackbench 100 process 200000
      				Time: 1953.047	1564.614
      ./hackbench 100 process 20000
      				Time: 207.176   156.940
      ./hackbench 100 process 20000
      				Time: 204.468	156.940
      ./hackbench 100 process 20000
      				Time: 204.879	158.772
      ./hackbench 10 process 20000
      				Time: 20.153	15.853
      ./hackbench 10 process 20000
      				Time: 20.153	15.986
      ./hackbench 10 process 20000
      				Time: 19.363	16.111
      ./hackbench 1 process 20000
      				Time: 2.518	2.307
      ./hackbench 1 process 20000
      				Time: 2.258	2.339
      ./hackbench 1 process 20000
      				Time: 2.864	2.163
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      49e22585
  9. 08 7月, 2011 1 次提交
  10. 02 7月, 2011 3 次提交
  11. 17 6月, 2011 1 次提交
    • C
      slab, slub, slob: Unify alignment definition · 3192b920
      Christoph Lameter 提交于
      Every slab has its on alignment definition in include/linux/sl?b_def.h. Extract those
      and define a common set in include/linux/slab.h.
      
      SLOB: As notes sometimes we need double word alignment on 32 bit. This gives all
      structures allocated by SLOB a unsigned long long alignment like the others do.
      
      SLAB: If ARCH_SLAB_MINALIGN is not set SLAB would set ARCH_SLAB_MINALIGN to
      zero meaning no alignment at all. Give it the default unsigned long long alignment.
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      3192b920
  12. 21 5月, 2011 1 次提交
  13. 08 5月, 2011 1 次提交
    • C
      slub: Remove CONFIG_CMPXCHG_LOCAL ifdeffery · 1759415e
      Christoph Lameter 提交于
      Remove the #ifdefs. This means that the irqsafe_cpu_cmpxchg_double() is used
      everywhere.
      
      There may be performance implications since:
      
      A. We now have to manage a transaction ID for all arches
      
      B. The interrupt holdoff for arches not supporting CONFIG_CMPXCHG_LOCAL is reduced
      to a very short irqoff section.
      
      There are no multiple irqoff/irqon sequences as a result of this change. Even in the fallback
      case we only have to do one disable and enable like before.
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      1759415e
  14. 23 3月, 2011 1 次提交
  15. 12 3月, 2011 1 次提交
  16. 11 3月, 2011 2 次提交
    • C
      Lockless (and preemptless) fastpaths for slub · 8a5ec0ba
      Christoph Lameter 提交于
      Use the this_cpu_cmpxchg_double functionality to implement a lockless
      allocation algorithm on arches that support fast this_cpu_ops.
      
      Each of the per cpu pointers is paired with a transaction id that ensures
      that updates of the per cpu information can only occur in sequence on
      a certain cpu.
      
      A transaction id is a "long" integer that is comprised of an event number
      and the cpu number. The event number is incremented for every change to the
      per cpu state. This means that the cmpxchg instruction can verify for an
      update that nothing interfered and that we are updating the percpu structure
      for the processor where we picked up the information and that we are also
      currently on that processor when we update the information.
      
      This results in a significant decrease of the overhead in the fastpaths. It
      also makes it easy to adopt the fast path for realtime kernels since this
      is lockless and does not require the use of the current per cpu area
      over the critical section. It is only important that the per cpu area is
      current at the beginning of the critical section and at the end.
      
      So there is no need even to disable preemption.
      
      Test results show that the fastpath cycle count is reduced by up to ~ 40%
      (alloc/free test goes from ~140 cycles down to ~80). The slowpath for kfree
      adds a few cycles.
      
      Sadly this does nothing for the slowpath which is where the main issues with
      performance in slub are but the best case performance rises significantly.
      (For that see the more complex slub patches that require cmpxchg_double)
      
      Kmalloc: alloc/free test
      
      Before:
      
      10000 times kmalloc(8)/kfree -> 134 cycles
      10000 times kmalloc(16)/kfree -> 152 cycles
      10000 times kmalloc(32)/kfree -> 144 cycles
      10000 times kmalloc(64)/kfree -> 142 cycles
      10000 times kmalloc(128)/kfree -> 142 cycles
      10000 times kmalloc(256)/kfree -> 132 cycles
      10000 times kmalloc(512)/kfree -> 132 cycles
      10000 times kmalloc(1024)/kfree -> 135 cycles
      10000 times kmalloc(2048)/kfree -> 135 cycles
      10000 times kmalloc(4096)/kfree -> 135 cycles
      10000 times kmalloc(8192)/kfree -> 144 cycles
      10000 times kmalloc(16384)/kfree -> 754 cycles
      
      After:
      
      10000 times kmalloc(8)/kfree -> 78 cycles
      10000 times kmalloc(16)/kfree -> 78 cycles
      10000 times kmalloc(32)/kfree -> 82 cycles
      10000 times kmalloc(64)/kfree -> 88 cycles
      10000 times kmalloc(128)/kfree -> 79 cycles
      10000 times kmalloc(256)/kfree -> 79 cycles
      10000 times kmalloc(512)/kfree -> 85 cycles
      10000 times kmalloc(1024)/kfree -> 82 cycles
      10000 times kmalloc(2048)/kfree -> 82 cycles
      10000 times kmalloc(4096)/kfree -> 85 cycles
      10000 times kmalloc(8192)/kfree -> 82 cycles
      10000 times kmalloc(16384)/kfree -> 706 cycles
      
      Kmalloc: Repeatedly allocate then free test
      
      Before:
      
      10000 times kmalloc(8) -> 211 cycles kfree -> 113 cycles
      10000 times kmalloc(16) -> 174 cycles kfree -> 115 cycles
      10000 times kmalloc(32) -> 235 cycles kfree -> 129 cycles
      10000 times kmalloc(64) -> 222 cycles kfree -> 120 cycles
      10000 times kmalloc(128) -> 343 cycles kfree -> 139 cycles
      10000 times kmalloc(256) -> 827 cycles kfree -> 147 cycles
      10000 times kmalloc(512) -> 1048 cycles kfree -> 272 cycles
      10000 times kmalloc(1024) -> 2043 cycles kfree -> 528 cycles
      10000 times kmalloc(2048) -> 4002 cycles kfree -> 571 cycles
      10000 times kmalloc(4096) -> 7740 cycles kfree -> 628 cycles
      10000 times kmalloc(8192) -> 8062 cycles kfree -> 850 cycles
      10000 times kmalloc(16384) -> 8895 cycles kfree -> 1249 cycles
      
      After:
      
      10000 times kmalloc(8) -> 190 cycles kfree -> 129 cycles
      10000 times kmalloc(16) -> 76 cycles kfree -> 123 cycles
      10000 times kmalloc(32) -> 126 cycles kfree -> 124 cycles
      10000 times kmalloc(64) -> 181 cycles kfree -> 128 cycles
      10000 times kmalloc(128) -> 310 cycles kfree -> 140 cycles
      10000 times kmalloc(256) -> 809 cycles kfree -> 165 cycles
      10000 times kmalloc(512) -> 1005 cycles kfree -> 269 cycles
      10000 times kmalloc(1024) -> 1999 cycles kfree -> 527 cycles
      10000 times kmalloc(2048) -> 3967 cycles kfree -> 570 cycles
      10000 times kmalloc(4096) -> 7658 cycles kfree -> 637 cycles
      10000 times kmalloc(8192) -> 8111 cycles kfree -> 859 cycles
      10000 times kmalloc(16384) -> 8791 cycles kfree -> 1173 cycles
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      8a5ec0ba
    • C
      slub: min_partial needs to be in first cacheline · 1a757fe5
      Christoph Lameter 提交于
      It is used in unfreeze_slab() which is a performance critical
      function.
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      1a757fe5
  17. 06 11月, 2010 1 次提交
    • R
      slub tracing: move trace calls out of always inlined functions to reduce kernel code size · 4a92379b
      Richard Kennedy 提交于
      Having the trace calls defined in the always inlined kmalloc functions
      in include/linux/slub_def.h causes a lot of code duplication as the
      trace functions get instantiated for each kamalloc call site. This can
      simply be removed by pushing the trace calls down into the functions in
      slub.c.
      
      On my x86_64 built this patch shrinks the code size of the kernel by
      approx 36K and also shrinks the code size of many modules -- too many to
      list here ;)
      
      size vmlinux (2.6.36) reports
             text        data     bss     dec     hex filename
          5410611	 743172	 828928	6982711	 6a8c37	vmlinux
          5373738	 744244	 828928	6946910	 6a005e	vmlinux + patch
      
      The resulting kernel has had some testing & kmalloc trace still seems to
      work.
      
      This patch
      - moves trace_kmalloc out of the inlined kmalloc() and pushes it down
      into kmem_cache_alloc_trace() so this it only get instantiated once.
      
      - rename kmem_cache_alloc_notrace()  to kmem_cache_alloc_trace() to
      indicate that now is does have tracing. (maybe this would better being
      called something like kmalloc_kmem_cache ?)
      
      - adds a new function kmalloc_order() to handle allocation and tracing
      of large allocations of page order.
      
      - removes tracing from the inlined kmalloc_large() replacing them with a
      call to kmalloc_order();
      
      - move tracing out of inlined kmalloc_node() and pushing it down into
      kmem_cache_alloc_node_trace
      
      - rename kmem_cache_alloc_node_notrace() to
      kmem_cache_alloc_node_trace()
      
      - removes the include of trace/events/kmem.h from slub_def.h.
      
      v2
      - keep kmalloc_order_trace inline when !CONFIG_TRACE
      Signed-off-by: NRichard Kennedy <richard@rsk.demon.co.uk>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      4a92379b
  18. 06 10月, 2010 1 次提交
  19. 02 10月, 2010 2 次提交
  20. 11 8月, 2010 1 次提交
    • F
      dma-mapping: rename ARCH_KMALLOC_MINALIGN to ARCH_DMA_MINALIGN · a6eb9fe1
      FUJITA Tomonori 提交于
      Now each architecture has the own dma_get_cache_alignment implementation.
      
      dma_get_cache_alignment returns the minimum DMA alignment.  Architectures
      define it as ARCH_KMALLOC_MINALIGN (it's used to make sure that malloc'ed
      buffer is DMA-safe; the buffer doesn't share a cache with the others).  So
      we can unify dma_get_cache_alignment implementations.
      
      This patch:
      
      dma_get_cache_alignment() needs to know if an architecture defines
      ARCH_KMALLOC_MINALIGN or not (needs to know if architecture has DMA
      alignment restriction).  However, slab.h define ARCH_KMALLOC_MINALIGN if
      architectures doesn't define it.
      
      Let's rename ARCH_KMALLOC_MINALIGN to ARCH_DMA_MINALIGN.
      ARCH_KMALLOC_MINALIGN is used only in the internals of slab/slob/slub
      (except for crypto).
      Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a6eb9fe1
  21. 09 8月, 2010 1 次提交
  22. 09 6月, 2010 1 次提交
  23. 30 5月, 2010 1 次提交
  24. 25 5月, 2010 1 次提交
  25. 20 5月, 2010 1 次提交
  26. 20 12月, 2009 3 次提交
    • C
      SLUB: this_cpu: Remove slub kmem_cache fields · ff12059e
      Christoph Lameter 提交于
      Remove the fields in struct kmem_cache_cpu that were used to cache data from
      struct kmem_cache when they were in different cachelines. The cacheline that
      holds the per cpu array pointer now also holds these values. We can cut down
      the struct kmem_cache_cpu size to almost half.
      
      The get_freepointer() and set_freepointer() functions that used to be only
      intended for the slow path now are also useful for the hot path since access
      to the size field does not require accessing an additional cacheline anymore.
      This results in consistent use of functions for setting the freepointer of
      objects throughout SLUB.
      
      Also we initialize all possible kmem_cache_cpu structures when a slab is
      created. No need to initialize them when a processor or node comes online.
      Signed-off-by: NChristoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      ff12059e
    • C
      SLUB: Get rid of dynamic DMA kmalloc cache allocation · 756dee75
      Christoph Lameter 提交于
      Dynamic DMA kmalloc cache allocation is troublesome since the
      new percpu allocator does not support allocations in atomic contexts.
      Reserve some statically allocated kmalloc_cpu structures instead.
      Signed-off-by: NChristoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      756dee75
    • C
      SLUB: Use this_cpu operations in slub · 9dfc6e68
      Christoph Lameter 提交于
      Using per cpu allocations removes the needs for the per cpu arrays in the
      kmem_cache struct. These could get quite big if we have to support systems
      with thousands of cpus. The use of this_cpu_xx operations results in:
      
      1. The size of kmem_cache for SMP configuration shrinks since we will only
         need 1 pointer instead of NR_CPUS. The same pointer can be used by all
         processors. Reduces cache footprint of the allocator.
      
      2. We can dynamically size kmem_cache according to the actual nodes in the
         system meaning less memory overhead for configurations that may potentially
         support up to 1k NUMA nodes / 4k cpus.
      
      3. We can remove the diddle widdle with allocating and releasing of
         kmem_cache_cpu structures when bringing up and shutting down cpus. The cpu
         alloc logic will do it all for us. Removes some portions of the cpu hotplug
         functionality.
      
      4. Fastpath performance increases since per cpu pointer lookups and
         address calculations are avoided.
      
      V7-V8
      - Convert missed get_cpu_slab() under CONFIG_SLUB_STATS
      Signed-off-by: NChristoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      9dfc6e68
  27. 11 12月, 2009 1 次提交
  28. 30 8月, 2009 1 次提交
  29. 06 8月, 2009 1 次提交
  30. 08 7月, 2009 1 次提交
  31. 12 6月, 2009 1 次提交
    • P
      slab,slub: don't enable interrupts during early boot · 7e85ee0c
      Pekka Enberg 提交于
      As explained by Benjamin Herrenschmidt:
      
        Oh and btw, your patch alone doesn't fix powerpc, because it's missing
        a whole bunch of GFP_KERNEL's in the arch code... You would have to
        grep the entire kernel for things that check slab_is_available() and
        even then you'll be missing some.
      
        For example, slab_is_available() didn't always exist, and so in the
        early days on powerpc, we used a mem_init_done global that is set form
        mem_init() (not perfect but works in practice). And we still have code
        using that to do the test.
      
      Therefore, mask out __GFP_WAIT, __GFP_IO, and __GFP_FS in the slab allocators
      in early boot code to avoid enabling interrupts.
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      7e85ee0c
  32. 12 4月, 2009 1 次提交