1. 05 6月, 2014 2 次提交
    • V
      sl[au]b: charge slabs to kmemcg explicitly · 5dfb4175
      Vladimir Davydov 提交于
      We have only a few places where we actually want to charge kmem so
      instead of intruding into the general page allocation path with
      __GFP_KMEMCG it's better to explictly charge kmem there.  All kmem
      charges will be easier to follow that way.
      
      This is a step towards removing __GFP_KMEMCG.  It removes __GFP_KMEMCG
      from memcg caches' allocflags.  Instead it makes slab allocation path
      call memcg_charge_kmem directly getting memcg to charge from the cache's
      memcg params.
      
      This also eliminates any possibility of misaccounting an allocation
      going from one memcg's cache to another memcg, because now we always
      charge slabs against the memcg the cache belongs to.  That's why this
      patch removes the big comment to memcg_kmem_get_cache.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Acked-by: NGreg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Glauber Costa <glommer@gmail.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5dfb4175
    • D
      mm, slab: suppress out of memory warning unless debug is enabled · 9a02d699
      David Rientjes 提交于
      When the slab or slub allocators cannot allocate additional slab pages,
      they emit diagnostic information to the kernel log such as current
      number of slabs, number of objects, active objects, etc.  This is always
      coupled with a page allocation failure warning since it is controlled by
      !__GFP_NOWARN.
      
      Suppress this out of memory warning if the allocator is configured
      without debug supported.  The page allocation failure warning will
      indicate it is a failed slab allocation, the order, and the gfp mask, so
      this is only useful to diagnose allocator issues.
      
      Since CONFIG_SLUB_DEBUG is already enabled by default for the slub
      allocator, there is no functional change with this patch.  If debug is
      disabled, however, the warnings are now suppressed.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9a02d699
  2. 06 5月, 2014 2 次提交
    • D
      slab: Fix off by one in object max number tests. · 30321c7b
      David Miller 提交于
      If freelist_idx_t is a byte, SLAB_OBJ_MAX_NUM should be 255 not 256, and
      likewise if freelist_idx_t is a short, then it should be 65535 not
      65536.
      
      This was leading to all kinds of random crashes on sparc64 where
      PAGE_SIZE is 8192.  One problem shown was that if spinlock debugging was
      enabled, we'd get deadlocks in copy_pte_range() or do_wp_page() with the
      same cpu already holding a lock it shouldn't hold, or the lock belonging
      to a completely unrelated process.
      
      Fixes: a41adfaa ("slab: introduce byte sized index for the freelist of a slab")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      30321c7b
    • J
      slab: fix the type of the index on freelist index accessor · 7cc68973
      Joonsoo Kim 提交于
      Commit a41adfaa ("slab: introduce byte sized index for the freelist
      of a slab") changes the size of freelist index and also changes
      prototype of accessor function to freelist index.  And there was a
      mistake.
      
      The mistake is that although it changes the size of freelist index
      correctly, it changes the size of the index of freelist index
      incorrectly.  With patch, freelist index can be 1 byte or 2 bytes, that
      means that num of object on on a slab can be more than 255.  So we need
      more than 1 byte for the index to find the index of free object on
      freelist.  But, above patch makes this index type 1 byte, so slab which
      have more than 255 objects cannot work properly and in consequence of
      it, the system cannot boot.
      
      This issue was reported by Steven King on m68knommu which would use
      2 bytes freelist index:
      
        https://lkml.org/lkml/2014/4/16/433
      
      To fix is easy.  To change the type of the index of freelist index on
      accessor functions is enough to fix this bug.  Although 2 bytes is
      enough, I use 4 bytes since it have no bad effect and make things more
      easier.  This fix was suggested and tested by Steven in his original
      report.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Reported-and-acked-by: NSteven King <sfking@fdwdc.com>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Tested-by: NJames Hogan <james.hogan@imgtec.com>
      Tested-by: NDavid Miller <davem@davemloft.net>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7cc68973
  3. 11 4月, 2014 1 次提交
  4. 08 4月, 2014 2 次提交
    • D
      mm, mempolicy: remove per-process flag · f0432d15
      David Rientjes 提交于
      PF_MEMPOLICY is an unnecessary optimization for CONFIG_SLAB users.
      There's no significant performance degradation to checking
      current->mempolicy rather than current->flags & PF_MEMPOLICY in the
      allocation path, especially since this is considered unlikely().
      
      Running TCP_RR with netperf-2.4.5 through localhost on 16 cpu machine with
      64GB of memory and without a mempolicy:
      
      	threads		before		after
      	16		1249409		1244487
      	32		1281786		1246783
      	48		1239175		1239138
      	64		1244642		1241841
      	80		1244346		1248918
      	96		1266436		1254316
      	112		1307398		1312135
      	128		1327607		1326502
      
      Per-process flags are a scarce resource so we should free them up whenever
      possible and make them available.  We'll be using it shortly for memcg oom
      reserves.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Tim Hockin <thockin@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f0432d15
    • D
      mm, mempolicy: rename slab_node for clarity · 2a389610
      David Rientjes 提交于
      slab_node() is actually a mempolicy function, so rename it to
      mempolicy_slab_node() to make it clearer that it used for processes with
      mempolicies.
      
      At the same time, cleanup its code by saving numa_mem_id() in a local
      variable (since we require a node with memory, not just any node) and
      remove an obsolete comment that assumes the mempolicy is actually passed
      into the function.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Tim Hockin <thockin@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2a389610
  5. 04 4月, 2014 1 次提交
  6. 01 4月, 2014 1 次提交
  7. 19 2月, 2014 1 次提交
  8. 08 2月, 2014 6 次提交
    • J
      slab: Make allocations with GFP_ZERO slightly more efficient · 5087c822
      Joe Perches 提交于
      Use the likely mechanism already around valid
      pointer tests to better choose when to memset
      to 0 allocations with __GFP_ZERO
      Acked-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      5087c822
    • J
      slab: make more slab management structure off the slab · 8fc9cf42
      Joonsoo Kim 提交于
      Now, the size of the freelist for the slab management diminish,
      so that the on-slab management structure can waste large space
      if the object of the slab is large.
      
      Consider a 128 byte sized slab. If on-slab is used, 31 objects can be
      in the slab. The size of the freelist for this case would be 31 bytes
      so that 97 bytes, that is, more than 75% of object size, are wasted.
      
      In a 64 byte sized slab case, no space is wasted if we use on-slab.
      So set off-slab determining constraint to 128 bytes.
      Acked-by: NChristoph Lameter <cl@linux.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      8fc9cf42
    • J
      slab: introduce byte sized index for the freelist of a slab · a41adfaa
      Joonsoo Kim 提交于
      Currently, the freelist of a slab consist of unsigned int sized indexes.
      Since most of slabs have less number of objects than 256, large sized
      indexes is needless. For example, consider the minimum kmalloc slab. It's
      object size is 32 byte and it would consist of one page, so 256 indexes
      through byte sized index are enough to contain all possible indexes.
      
      There can be some slabs whose object size is 8 byte. We cannot handle
      this case with byte sized index, so we need to restrict minimum
      object size. Since these slabs are not major, wasted memory from these
      slabs would be negligible.
      
      Some architectures' page size isn't 4096 bytes and rather larger than
      4096 bytes (One example is 64KB page size on PPC or IA64) so that
      byte sized index doesn't fit to them. In this case, we will use
      two bytes sized index.
      
      Below is some number for this patch.
      
      * Before *
      kmalloc-512          525    640    512    8    1 : tunables   54   27    0 : slabdata     80     80      0
      kmalloc-256          210    210    256   15    1 : tunables  120   60    0 : slabdata     14     14      0
      kmalloc-192         1016   1040    192   20    1 : tunables  120   60    0 : slabdata     52     52      0
      kmalloc-96           560    620    128   31    1 : tunables  120   60    0 : slabdata     20     20      0
      kmalloc-64          2148   2280     64   60    1 : tunables  120   60    0 : slabdata     38     38      0
      kmalloc-128          647    682    128   31    1 : tunables  120   60    0 : slabdata     22     22      0
      kmalloc-32         11360  11413     32  113    1 : tunables  120   60    0 : slabdata    101    101      0
      kmem_cache           197    200    192   20    1 : tunables  120   60    0 : slabdata     10     10      0
      
      * After *
      kmalloc-512          521    648    512    8    1 : tunables   54   27    0 : slabdata     81     81      0
      kmalloc-256          208    208    256   16    1 : tunables  120   60    0 : slabdata     13     13      0
      kmalloc-192         1029   1029    192   21    1 : tunables  120   60    0 : slabdata     49     49      0
      kmalloc-96           529    589    128   31    1 : tunables  120   60    0 : slabdata     19     19      0
      kmalloc-64          2142   2142     64   63    1 : tunables  120   60    0 : slabdata     34     34      0
      kmalloc-128          660    682    128   31    1 : tunables  120   60    0 : slabdata     22     22      0
      kmalloc-32         11716  11780     32  124    1 : tunables  120   60    0 : slabdata     95     95      0
      kmem_cache           197    210    192   21    1 : tunables  120   60    0 : slabdata     10     10      0
      
      kmem_caches consisting of objects less than or equal to 256 byte have
      one or more objects than before. In the case of kmalloc-32, we have 11 more
      objects, so 352 bytes (11 * 32) are saved and this is roughly 9% saving of
      memory. Of couse, this percentage decreases as the number of objects
      in a slab decreases.
      
      Here are the performance results on my 4 cpus machine.
      
      * Before *
      
       Performance counter stats for 'perf bench sched messaging -g 50 -l 1000' (10 runs):
      
             229,945,138 cache-misses                                                  ( +-  0.23% )
      
            11.627897174 seconds time elapsed                                          ( +-  0.14% )
      
      * After *
      
       Performance counter stats for 'perf bench sched messaging -g 50 -l 1000' (10 runs):
      
             218,640,472 cache-misses                                                  ( +-  0.42% )
      
            11.504999837 seconds time elapsed                                          ( +-  0.21% )
      
      cache-misses are reduced by this patchset, roughly 5%.
      And elapsed times are improved by 1%.
      Acked-by: NChristoph Lameter <cl@linux.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      a41adfaa
    • J
      slab: restrict the number of objects in a slab · f315e3fa
      Joonsoo Kim 提交于
      To prepare to implement byte sized index for managing the freelist
      of a slab, we should restrict the number of objects in a slab to be less
      or equal to 256, since byte only represent 256 different values.
      Setting the size of object to value equal or more than newly introduced
      SLAB_OBJ_MIN_SIZE ensures that the number of objects in a slab is less or
      equal to 256 for a slab with 1 page.
      
      If page size is rather larger than 4096, above assumption would be wrong.
      In this case, we would fall back on 2 bytes sized index.
      
      If minimum size of kmalloc is less than 16, we use it as minimum object
      size and give up this optimization.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      f315e3fa
    • J
      slab: introduce helper functions to get/set free object · e5c58dfd
      Joonsoo Kim 提交于
      In the following patches, to get/set free objects from the freelist
      is changed so that simple casting doesn't work for it. Therefore,
      introduce helper functions.
      Acked-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      e5c58dfd
    • J
      slab: factor out calculate nr objects in cache_estimate · 9cef2e2b
      Joonsoo Kim 提交于
      This logic is not simple to understand so that making separate function
      helping readability. Additionally, we can use this change in the
      following patch which implement for freelist to have another sized index
      in according to nr objects.
      Acked-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      9cef2e2b
  9. 31 1月, 2014 1 次提交
  10. 13 11月, 2013 1 次提交
  11. 30 10月, 2013 2 次提交
  12. 25 10月, 2013 15 次提交
  13. 15 7月, 2013 1 次提交
    • P
      kernel: delete __cpuinit usage from all core kernel files · 0db0628d
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      This removes all the uses of the __cpuinit macros from C files in
      the core kernel directories (kernel, init, lib, mm, and include)
      that don't really have a specific maintainer.
      
      [1] https://lkml.org/lkml/2013/5/20/589Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      0db0628d
  14. 08 7月, 2013 1 次提交
  15. 07 7月, 2013 3 次提交