- 10 10月, 2014 5 次提交
-
-
由 Joonsoo Kim 提交于
Slab merge is good feature to reduce fragmentation. If new creating slab have similar size and property with exsitent slab, this feature reuse it rather than creating new one. As a result, objects are packed into fewer slabs so that fragmentation is reduced. Below is result of my testing. * After boot, sleep 20; cat /proc/meminfo | grep Slab <Before> Slab: 25136 kB <After> Slab: 24364 kB We can save 3% memory used by slab. For supporting this feature in SLAB, we need to implement SLAB specific kmem_cache_flag() and __kmem_cache_alias(), because SLUB implements some SLUB specific processing related to debug flag and object size change on these functions. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joonsoo Kim 提交于
cache_free_alien() is rarely used function when node mismatch. But, it is defined with inline attribute so it is inlined to __cache_free() which is core free function of slab allocator. It uselessly makes kmem_cache_free()/kfree() functions large. What we really need to inline is just checking node match so this patch factor out other parts of cache_free_alien() to reduce code size of kmem_cache_free()/ kfree(). <Before> nm -S mm/slab.o | grep -e "T kfree" -e "T kmem_cache_free" 00000000000011e0 0000000000000228 T kfree 0000000000000670 0000000000000216 T kmem_cache_free <After> nm -S mm/slab.o | grep -e "T kfree" -e "T kmem_cache_free" 0000000000001110 00000000000001b5 T kfree 0000000000000750 0000000000000181 T kmem_cache_free You can see slightly reduced size of text: 0x228->0x1b5, 0x216->0x181. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joonsoo Kim 提交于
Our intention of __ac_put_obj() is that it doesn't affect anything if sk_memalloc_socks() is disabled. But, because __ac_put_obj() is too small, compiler inline it to ac_put_obj() and affect code size of free path. This patch add noinline keyword for __ac_put_obj() not to distrupt normal free path at all. <Before> nm -S slab-orig.o | grep -e "t cache_alloc_refill" -e "T kfree" -e "T kmem_cache_free" 0000000000001e80 00000000000002f5 t cache_alloc_refill 0000000000001230 0000000000000258 T kfree 0000000000000690 000000000000024c T kmem_cache_free <After> nm -S slab-patched.o | grep -e "t cache_alloc_refill" -e "T kfree" -e "T kmem_cache_free" 0000000000001e00 00000000000002e5 t cache_alloc_refill 00000000000011e0 0000000000000228 T kfree 0000000000000670 0000000000000216 T kmem_cache_free cache_alloc_refill: 0x2f5->0x2e5 kfree: 0x256->0x228 kmem_cache_free: 0x24c->0x216 code size of each function is reduced slightly. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joonsoo Kim 提交于
Now, due to likely keyword, compiled code of cache_flusharray() is on unlikely.text section. Although it is uncommon case compared to free to cpu cache case, it is common case than free_block(). But, free_block() is on normal text section. This patch fix this odd situation to remove likely keyword. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joonsoo Kim 提交于
Now, we track caller if tracing or slab debugging is enabled. If they are disabled, we could save one argument passing overhead by calling __kmalloc(_node)(). But, I think that it would be marginal. Furthermore, default slab allocator, SLUB, doesn't use this technique so I think that it's okay to change this situation. After this change, we can turn on/off CONFIG_DEBUG_SLAB without full kernel build and remove some complicated '#if' defintion. It looks more benefitial to me. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NChristoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 9月, 2014 1 次提交
-
-
由 David Rientjes 提交于
Since commit 45906855 ("mm/sl[aou]b: Common alignment code"), the "ralign" automatic variable in __kmem_cache_create() may be used as uninitialized. The proper alignment defaults to BYTES_PER_WORD and can be overridden by SLAB_RED_ZONE or the alignment specified by the caller. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=85031Signed-off-by: NDavid Rientjes <rientjes@google.com> Reported-by: NAndrei Elovikov <a.elovikov@gmail.com> Acked-by: NChristoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 9月, 2014 1 次提交
-
-
由 Zefan Li 提交于
When we change cpuset.memory_spread_{page,slab}, cpuset will flip PF_SPREAD_{PAGE,SLAB} bit of tsk->flags for each task in that cpuset. This should be done using atomic bitops, but currently we don't, which is broken. Tetsuo reported a hard-to-reproduce kernel crash on RHEL6, which happened when one thread tried to clear PF_USED_MATH while at the same time another thread tried to flip PF_SPREAD_PAGE/PF_SPREAD_SLAB. They both operate on the same task. Here's the full report: https://lkml.org/lkml/2014/9/19/230 To fix this, we make PF_SPREAD_PAGE and PF_SPREAD_SLAB atomic flags. v4: - updated mm/slab.c. (Fengguang Wu) - updated Documentation. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Miao Xie <miaox@cn.fujitsu.com> Cc: Kees Cook <keescook@chromium.org> Fixes: 950592f7 ("cpusets: update tasks' page/slab spread flags in time") Cc: <stable@vger.kernel.org> # 2.6.31+ Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: NZefan Li <lizefan@huawei.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 09 8月, 2014 1 次提交
-
-
由 Joonsoo Kim 提交于
This reverts commit a6406168 ("slab: remove BAD_ALIEN_MAGIC"). commit a6406168 ("slab: remove BAD_ALIEN_MAGIC") assumes that the system with !CONFIG_NUMA has only one memory node. But, it turns out to be false by the report from Geert. His system, m68k, has many memory nodes and is configured in !CONFIG_NUMA. So it couldn't boot with above change. Here goes his failure report. With latest mainline, I'm getting a crash during bootup on m68k/ARAnyM: enable_cpucache failed for radix_tree_node, error 12. kernel BUG at /scratch/geert/linux/linux-m68k/mm/slab.c:1522! *** TRAP #7 *** FORMAT=0 Current process id is 0 BAD KERNEL TRAP: 00000000 Modules linked in: PC: [<0039c92c>] kmem_cache_init_late+0x70/0x8c SR: 2200 SP: 00345f90 a2: 0034c2e8 d0: 0000003d d1: 00000000 d2: 00000000 d3: 003ac942 d4: 00000000 d5: 00000000 a0: 0034f686 a1: 0034f682 Process swapper (pid: 0, task=0034c2e8) Frame format=0 Stack from 00345fc4: 002f69ef 002ff7e5 000005f2 000360fa 0017d806 003921d4 00000000 00000000 00000000 00000000 00000000 00000000 003ac942 00000000 003912d6 Call Trace: [<000360fa>] parse_args+0x0/0x2ca [<0017d806>] strlen+0x0/0x1a [<003921d4>] start_kernel+0x23c/0x428 [<003912d6>] _sinittext+0x2d6/0x95e Code: f7e5 4879 002f 69ef 61ff ffca 462a 4e47 <4879> 0035 4b1c 61ff fff0 0cc4 7005 23c0 0037 fd20 588f 265f 285f 4e75 48e7 301c Disabling lock debugging due to kernel taint Kernel panic - not syncing: Attempted to kill the idle task! Although there is a alternative way to fix this issue such as disabling use of alien cache on !CONFIG_NUMA, but, reverting issued commit is better to me in this time. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 8月, 2014 13 次提交
-
-
由 Wang Sheng-Hui 提交于
Current struct kmem_cache has no 'lock' field, and slab page is managed by struct kmem_cache_node, which has 'list_lock' field. Clean up the related comment. Signed-off-by: NWang Sheng-Hui <shhuiw@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joonsoo Kim 提交于
It is better to represent allocation size in size_t rather than int. So change it. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Suggested-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Christoph Lameter <cl@linux.com> Reviewed-by: NPekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joonsoo Kim 提交于
BAD_ALIEN_MAGIC value isn't used anymore. So remove it. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NChristoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joonsoo Kim 提交于
Now, there is no code to hold two lock simultaneously, since we don't call slab_destroy() with holding any lock. So, lockdep annotation is useless now. Remove it. v2: don't remove BAD_ALIEN_MAGIC in this patch. It will be removed in the following patch. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NChristoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joonsoo Kim 提交于
I haven't heard that this alien cache lock is contended, but to reduce chance of contention would be better generally. And with this change, we can simplify complex lockdep annotation in slab code. In the following patch, it will be implemented. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NChristoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joonsoo Kim 提交于
Now, we have separate alien_cache structure, so it'd be better to hold the lock on alien_cache while manipulating alien_cache. After that, we don't need the lock on array_cache, so remove it. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NChristoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joonsoo Kim 提交于
Currently, we use array_cache for alien_cache. Although they are mostly similar, there is one difference, that is, need for spinlock. We don't need spinlock for array_cache itself, but to use array_cache for alien_cache, array_cache structure should have spinlock. This is needless overhead, so removing it would be better. This patch prepare it by introducing alien_cache and using it. In the following patch, we remove spinlock in array_cache. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NChristoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joonsoo Kim 提交于
Factor out initialization of array cache to use it in following patch. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NChristoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Acked-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joonsoo Kim 提交于
In free_block(), if freeing object makes new free slab and number of free_objects exceeds free_limit, we start to destroy this new free slab with holding the kmem_cache node lock. Holding the lock is useless and, generally, holding a lock as least as possible is good thing. I never measure performance effect of this, but we'd be better not to hold the lock as much as possible. Commented by Christoph: This is also good because kmem_cache_free is no longer called while holding the node lock. So we avoid one case of recursion. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NChristoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Acked-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joonsoo Kim 提交于
node isn't changed, so we don't need to retreive this structure everytime we move the object. Maybe compiler do this optimization, but making it explicitly is better. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NChristoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Acked-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joonsoo Kim 提交于
This patchset does some cleanup and tries to remove lockdep annotation. Patches 1~2 are just for really really minor improvement. Patches 3~9 are for clean-up and removing lockdep annotation. There are two cases that lockdep annotation is needed in SLAB. 1) holding two node locks 2) holding two array cache(alien cache) locks I looked at the code and found that we can avoid these cases without any negative effect. 1) occurs if freeing object makes new free slab and we decide to destroy it. Although we don't need to hold the lock during destroying a slab, current code do that. Destroying a slab without holding the lock would help the reduction of the lock contention. To do it, I change the implementation that new free slab is destroyed after releasing the lock. 2) occurs on similar situation. When we free object from non-local node, we put this object to alien cache with holding the alien cache lock. If alien cache is full, we try to flush alien cache to proper node cache, and, in this time, new free slab could be made. Destroying it would be started and we will free metadata object which comes from another node. In this case, we need another node's alien cache lock to free object. This forces us to hold two array cache locks and then we need lockdep annotation although they are always different locks and deadlock cannot be possible. To prevent this situation, I use same way as 1). In this way, we can avoid 1) and 2) cases, and then, can remove lockdep annotation. As short stat noted, this makes SLAB code much simpler. This patch (of 9): slab_should_failslab() is called on every allocation, so to optimize it is reasonable. We normally don't allocate from kmem_cache. It is just used when new kmem_cache is created, so it's very rare case. Therefore, add unlikely macro to help compiler optimization. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NDavid Rientjes <rientjes@google.com> Acked-by: NChristoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Lameter 提交于
Use the two functions to simplify the code avoiding numerous explicit checks coded checking for a certain node to be online. Get rid of various repeated calculations of kmem_cache_node structures. [akpm@linux-foundation.org: fix build] Signed-off-by: NChristoph Lameter <cl@linux.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Fabian Frederick 提交于
init_lock_keys is only called by __init kmem_cache_init_late Signed-off-by: NFabian Frederick <fabf@skynet.be> Acked-by: NChristoph Lameter <cl@linux.com> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 24 6月, 2014 1 次提交
-
-
由 Joonsoo Kim 提交于
Commit b1cb0982 ("change the management method of free objects of the slab") introduced a bug on slab leak detector ('/proc/slab_allocators'). This detector works like as following decription. 1. traverse all objects on all the slabs. 2. determine whether it is active or not. 3. if active, print who allocate this object. but that commit changed the way how to manage free objects, so the logic determining whether it is active or not is also changed. In before, we regard object in cpu caches as inactive one, but, with this commit, we mistakenly regard object in cpu caches as active one. This intoduces kernel oops if DEBUG_PAGEALLOC is enabled. If DEBUG_PAGEALLOC is enabled, kernel_map_pages() is used to detect who corrupt free memory in the slab. It unmaps page table mapping if object is free and map it if object is active. When slab leak detector check object in cpu caches, it mistakenly think this object active so try to access object memory to retrieve caller of allocation. At this point, page table mapping to this object doesn't exist, so oops occurs. Following is oops message reported from Dave. It blew up when something tried to read /proc/slab_allocators (Just cat it, and you should see the oops below) Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: [snip...] CPU: 1 PID: 9386 Comm: trinity-c33 Not tainted 3.14.0-rc5+ #131 task: ffff8801aa46e890 ti: ffff880076924000 task.ti: ffff880076924000 RIP: 0010:[<ffffffffaa1a8f4a>] [<ffffffffaa1a8f4a>] handle_slab+0x8a/0x180 RSP: 0018:ffff880076925de0 EFLAGS: 00010002 RAX: 0000000000001000 RBX: 0000000000000000 RCX: 000000005ce85ce7 RDX: ffffea00079be100 RSI: 0000000000001000 RDI: ffff880107458000 RBP: ffff880076925e18 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 000000000000000f R12: ffff8801e6f84000 R13: ffffea00079be100 R14: ffff880107458000 R15: ffff88022bb8d2c0 FS: 00007fb769e45740(0000) GS:ffff88024d040000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff8801e6f84ff8 CR3: 00000000a22db000 CR4: 00000000001407e0 DR0: 0000000002695000 DR1: 0000000002695000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000070602 Call Trace: leaks_show+0xce/0x240 seq_read+0x28e/0x490 proc_reg_read+0x3d/0x80 vfs_read+0x9b/0x160 SyS_read+0x58/0xb0 tracesys+0xd4/0xd9 Code: f5 00 00 00 0f 1f 44 00 00 48 63 c8 44 3b 0c 8a 0f 84 e3 00 00 00 83 c0 01 44 39 c0 72 eb 41 f6 47 1a 01 0f 84 e9 00 00 00 89 f0 <4d> 8b 4c 04 f8 4d 85 c9 0f 84 88 00 00 00 49 8b 7e 08 4d 8d 46 RIP handle_slab+0x8a/0x180 To fix the problem, I introduce an object status buffer on each slab. With this, we can track object status precisely, so slab leak detector would not access active object and no kernel oops would occur. Memory overhead caused by this fix is only imposed to CONFIG_DEBUG_SLAB_LEAK which is mainly used for debugging, so memory overhead isn't big problem. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Reported-by: NDave Jones <davej@redhat.com> Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: NVladimir Davydov <vdavydov@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 6月, 2014 4 次提交
-
-
由 Vladimir Davydov 提交于
Currently we have two pairs of kmemcg-related functions that are called on slab alloc/free. The first is memcg_{bind,release}_pages that count the total number of pages allocated on a kmem cache. The second is memcg_{un}charge_slab that {un}charge slab pages to kmemcg resource counter. Let's just merge them to keep the code clean. Signed-off-by: NVladimir Davydov <vdavydov@parallels.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Glauber Costa <glommer@gmail.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vladimir Davydov 提交于
When we create a sl[au]b cache, we allocate kmem_cache_node structures for each online NUMA node. To handle nodes taken online/offline, we register memory hotplug notifier and allocate/free kmem_cache_node corresponding to the node that changes its state for each kmem cache. To synchronize between the two paths we hold the slab_mutex during both the cache creationg/destruction path and while tuning per-node parts of kmem caches in memory hotplug handler, but that's not quite right, because it does not guarantee that a newly created cache will have all kmem_cache_nodes initialized in case it races with memory hotplug. For instance, in case of slub: CPU0 CPU1 ---- ---- kmem_cache_create: online_pages: __kmem_cache_create: slab_memory_callback: slab_mem_going_online_callback: lock slab_mutex for each slab_caches list entry allocate kmem_cache node unlock slab_mutex lock slab_mutex init_kmem_cache_nodes: for_each_node_state(node, N_NORMAL_MEMORY) allocate kmem_cache node add kmem_cache to slab_caches list unlock slab_mutex online_pages (continued): node_states_set_node As a result we'll get a kmem cache with not all kmem_cache_nodes allocated. To avoid issues like that we should hold get/put_online_mems() during the whole kmem cache creation/destruction/shrink paths, just like we deal with cpu hotplug. This patch does the trick. Note, that after it's applied, there is no need in taking the slab_mutex for kmem_cache_shrink any more, so it is removed from there. Signed-off-by: NVladimir Davydov <vdavydov@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vladimir Davydov 提交于
We have only a few places where we actually want to charge kmem so instead of intruding into the general page allocation path with __GFP_KMEMCG it's better to explictly charge kmem there. All kmem charges will be easier to follow that way. This is a step towards removing __GFP_KMEMCG. It removes __GFP_KMEMCG from memcg caches' allocflags. Instead it makes slab allocation path call memcg_charge_kmem directly getting memcg to charge from the cache's memcg params. This also eliminates any possibility of misaccounting an allocation going from one memcg's cache to another memcg, because now we always charge slabs against the memcg the cache belongs to. That's why this patch removes the big comment to memcg_kmem_get_cache. Signed-off-by: NVladimir Davydov <vdavydov@parallels.com> Acked-by: NGreg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Acked-by: NMichal Hocko <mhocko@suse.cz> Cc: Glauber Costa <glommer@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Rientjes 提交于
When the slab or slub allocators cannot allocate additional slab pages, they emit diagnostic information to the kernel log such as current number of slabs, number of objects, active objects, etc. This is always coupled with a page allocation failure warning since it is controlled by !__GFP_NOWARN. Suppress this out of memory warning if the allocator is configured without debug supported. The page allocation failure warning will indicate it is a failed slab allocation, the order, and the gfp mask, so this is only useful to diagnose allocator issues. Since CONFIG_SLUB_DEBUG is already enabled by default for the slub allocator, there is no functional change with this patch. If debug is disabled, however, the warnings are now suppressed. Signed-off-by: NDavid Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Acked-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 5月, 2014 2 次提交
-
-
由 David Miller 提交于
If freelist_idx_t is a byte, SLAB_OBJ_MAX_NUM should be 255 not 256, and likewise if freelist_idx_t is a short, then it should be 65535 not 65536. This was leading to all kinds of random crashes on sparc64 where PAGE_SIZE is 8192. One problem shown was that if spinlock debugging was enabled, we'd get deadlocks in copy_pte_range() or do_wp_page() with the same cpu already holding a lock it shouldn't hold, or the lock belonging to a completely unrelated process. Fixes: a41adfaa ("slab: introduce byte sized index for the freelist of a slab") Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joonsoo Kim 提交于
Commit a41adfaa ("slab: introduce byte sized index for the freelist of a slab") changes the size of freelist index and also changes prototype of accessor function to freelist index. And there was a mistake. The mistake is that although it changes the size of freelist index correctly, it changes the size of the index of freelist index incorrectly. With patch, freelist index can be 1 byte or 2 bytes, that means that num of object on on a slab can be more than 255. So we need more than 1 byte for the index to find the index of free object on freelist. But, above patch makes this index type 1 byte, so slab which have more than 255 objects cannot work properly and in consequence of it, the system cannot boot. This issue was reported by Steven King on m68knommu which would use 2 bytes freelist index: https://lkml.org/lkml/2014/4/16/433 To fix is easy. To change the type of the index of freelist index on accessor functions is enough to fix this bug. Although 2 bytes is enough, I use 4 bytes since it have no bad effect and make things more easier. This fix was suggested and tested by Steven in his original report. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Reported-and-acked-by: NSteven King <sfking@fdwdc.com> Acked-by: NChristoph Lameter <cl@linux.com> Tested-by: NJames Hogan <james.hogan@imgtec.com> Tested-by: NDavid Miller <davem@davemloft.net> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 4月, 2014 1 次提交
-
-
由 Dave Hansen 提交于
'struct page' has two list_head fields: 'lru' and 'list'. Conveniently, they are unioned together. This means that code can use them interchangably, which gets horribly confusing like with this nugget from slab.c: > list_del(&page->lru); > if (page->active == cachep->num) > list_add(&page->list, &n->slabs_full); This patch makes the slab and slub code use page->lru universally instead of mixing ->list and ->lru. So, the new rule is: page->lru is what the you use if you want to keep your page on a list. Don't like the fact that it's not called ->list? Too bad. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Acked-by: NChristoph Lameter <cl@linux.com> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NPekka Enberg <penberg@kernel.org>
-
- 08 4月, 2014 2 次提交
-
-
由 David Rientjes 提交于
PF_MEMPOLICY is an unnecessary optimization for CONFIG_SLAB users. There's no significant performance degradation to checking current->mempolicy rather than current->flags & PF_MEMPOLICY in the allocation path, especially since this is considered unlikely(). Running TCP_RR with netperf-2.4.5 through localhost on 16 cpu machine with 64GB of memory and without a mempolicy: threads before after 16 1249409 1244487 32 1281786 1246783 48 1239175 1239138 64 1244642 1241841 80 1244346 1248918 96 1266436 1254316 112 1307398 1312135 128 1327607 1326502 Per-process flags are a scarce resource so we should free them up whenever possible and make them available. We'll be using it shortly for memcg oom reserves. Signed-off-by: NDavid Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Tim Hockin <thockin@google.com> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Rientjes 提交于
slab_node() is actually a mempolicy function, so rename it to mempolicy_slab_node() to make it clearer that it used for processes with mempolicies. At the same time, cleanup its code by saving numa_mem_id() in a local variable (since we require a node with memory, not just any node) and remove an obsolete comment that assumes the mempolicy is actually passed into the function. Signed-off-by: NDavid Rientjes <rientjes@google.com> Acked-by: NChristoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Tim Hockin <thockin@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 4月, 2014 1 次提交
-
-
由 Mel Gorman 提交于
Since put_mems_allowed() is strictly optional, its a seqcount retry, we don't need to evaluate the function if the allocation was in fact successful, saving a smp_rmb some loads and comparisons on some relative fast-paths. Since the naming, get/put_mems_allowed() does suggest a mandatory pairing, rename the interface, as suggested by Mel, to resemble the seqcount interface. This gives us: read_mems_allowed_begin() and read_mems_allowed_retry(), where it is important to note that the return value of the latter call is inverted from its previous incarnation. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 4月, 2014 1 次提交
-
-
由 Jianyu Zhan 提交于
As time goes, the code changes a lot, and this leads to that some old-days comments scatter around , which instead of faciliating understanding, but make more confusion. So this patch cleans up them. Also, this patch unifies some variables naming. Acked-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NJianyu Zhan <nasa4836@gmail.com> Signed-off-by: NPekka Enberg <penberg@kernel.org>
-
- 19 2月, 2014 1 次提交
-
-
由 Masanari Iida 提交于
This patch fixed following errors while make htmldocs Warning(/mm/slab.c:1956): No description found for parameter 'page' Warning(/mm/slab.c:1956): Excess function parameter 'slabp' description in 'slab_destroy' Incorrect function parameter "slabp" was set instead of "page" Signed-off-by: NMasanari Iida <standby24x7@gmail.com> Acked-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
- 08 2月, 2014 6 次提交
-
-
由 Joe Perches 提交于
Use the likely mechanism already around valid pointer tests to better choose when to memset to 0 allocations with __GFP_ZERO Acked-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NPekka Enberg <penberg@kernel.org>
-
由 Joonsoo Kim 提交于
Now, the size of the freelist for the slab management diminish, so that the on-slab management structure can waste large space if the object of the slab is large. Consider a 128 byte sized slab. If on-slab is used, 31 objects can be in the slab. The size of the freelist for this case would be 31 bytes so that 97 bytes, that is, more than 75% of object size, are wasted. In a 64 byte sized slab case, no space is wasted if we use on-slab. So set off-slab determining constraint to 128 bytes. Acked-by: NChristoph Lameter <cl@linux.com> Acked-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NPekka Enberg <penberg@kernel.org>
-
由 Joonsoo Kim 提交于
Currently, the freelist of a slab consist of unsigned int sized indexes. Since most of slabs have less number of objects than 256, large sized indexes is needless. For example, consider the minimum kmalloc slab. It's object size is 32 byte and it would consist of one page, so 256 indexes through byte sized index are enough to contain all possible indexes. There can be some slabs whose object size is 8 byte. We cannot handle this case with byte sized index, so we need to restrict minimum object size. Since these slabs are not major, wasted memory from these slabs would be negligible. Some architectures' page size isn't 4096 bytes and rather larger than 4096 bytes (One example is 64KB page size on PPC or IA64) so that byte sized index doesn't fit to them. In this case, we will use two bytes sized index. Below is some number for this patch. * Before * kmalloc-512 525 640 512 8 1 : tunables 54 27 0 : slabdata 80 80 0 kmalloc-256 210 210 256 15 1 : tunables 120 60 0 : slabdata 14 14 0 kmalloc-192 1016 1040 192 20 1 : tunables 120 60 0 : slabdata 52 52 0 kmalloc-96 560 620 128 31 1 : tunables 120 60 0 : slabdata 20 20 0 kmalloc-64 2148 2280 64 60 1 : tunables 120 60 0 : slabdata 38 38 0 kmalloc-128 647 682 128 31 1 : tunables 120 60 0 : slabdata 22 22 0 kmalloc-32 11360 11413 32 113 1 : tunables 120 60 0 : slabdata 101 101 0 kmem_cache 197 200 192 20 1 : tunables 120 60 0 : slabdata 10 10 0 * After * kmalloc-512 521 648 512 8 1 : tunables 54 27 0 : slabdata 81 81 0 kmalloc-256 208 208 256 16 1 : tunables 120 60 0 : slabdata 13 13 0 kmalloc-192 1029 1029 192 21 1 : tunables 120 60 0 : slabdata 49 49 0 kmalloc-96 529 589 128 31 1 : tunables 120 60 0 : slabdata 19 19 0 kmalloc-64 2142 2142 64 63 1 : tunables 120 60 0 : slabdata 34 34 0 kmalloc-128 660 682 128 31 1 : tunables 120 60 0 : slabdata 22 22 0 kmalloc-32 11716 11780 32 124 1 : tunables 120 60 0 : slabdata 95 95 0 kmem_cache 197 210 192 21 1 : tunables 120 60 0 : slabdata 10 10 0 kmem_caches consisting of objects less than or equal to 256 byte have one or more objects than before. In the case of kmalloc-32, we have 11 more objects, so 352 bytes (11 * 32) are saved and this is roughly 9% saving of memory. Of couse, this percentage decreases as the number of objects in a slab decreases. Here are the performance results on my 4 cpus machine. * Before * Performance counter stats for 'perf bench sched messaging -g 50 -l 1000' (10 runs): 229,945,138 cache-misses ( +- 0.23% ) 11.627897174 seconds time elapsed ( +- 0.14% ) * After * Performance counter stats for 'perf bench sched messaging -g 50 -l 1000' (10 runs): 218,640,472 cache-misses ( +- 0.42% ) 11.504999837 seconds time elapsed ( +- 0.21% ) cache-misses are reduced by this patchset, roughly 5%. And elapsed times are improved by 1%. Acked-by: NChristoph Lameter <cl@linux.com> Acked-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NPekka Enberg <penberg@kernel.org>
-
由 Joonsoo Kim 提交于
To prepare to implement byte sized index for managing the freelist of a slab, we should restrict the number of objects in a slab to be less or equal to 256, since byte only represent 256 different values. Setting the size of object to value equal or more than newly introduced SLAB_OBJ_MIN_SIZE ensures that the number of objects in a slab is less or equal to 256 for a slab with 1 page. If page size is rather larger than 4096, above assumption would be wrong. In this case, we would fall back on 2 bytes sized index. If minimum size of kmalloc is less than 16, we use it as minimum object size and give up this optimization. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NPekka Enberg <penberg@kernel.org>
-
由 Joonsoo Kim 提交于
In the following patches, to get/set free objects from the freelist is changed so that simple casting doesn't work for it. Therefore, introduce helper functions. Acked-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NPekka Enberg <penberg@kernel.org>
-
由 Joonsoo Kim 提交于
This logic is not simple to understand so that making separate function helping readability. Additionally, we can use this change in the following patch which implement for freelist to have another sized index in according to nr objects. Acked-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NPekka Enberg <penberg@kernel.org>
-