- 08 2月, 2008 6 次提交
-
-
由 Ingo Molnar 提交于
fix checkpatch --file mm/slub.c errors and warnings. $ q-code-quality-compare errors lines of code errors/KLOC mm/slub.c [before] 22 4204 5.2 mm/slub.c [after] 0 4210 0 no code changed: text data bss dec hex filename 22195 8634 136 30965 78f5 slub.o.before 22195 8634 136 30965 78f5 slub.o.after md5: 93cdfbec2d6450622163c590e1064358 slub.o.before.asm 93cdfbec2d6450622163c590e1064358 slub.o.after.asm [clameter: rediffed against Pekka's cleanup patch, omitted moves of the name of a function to the start of line] Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NChristoph Lameter <clameter@sgi.com>
-
由 Nick Piggin 提交于
Slub can use the non-atomic version to unlock because other flags will not get modified with the lock held. Signed-off-by: NNick Piggin <npiggin@suse.de> Acked-by: NChristoph Lameter <clameter@sgi.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
-
由 Christoph Lameter 提交于
The statistics provided here allow the monitoring of allocator behavior but at the cost of some (minimal) loss of performance. Counters are placed in SLUB's per cpu data structure. The per cpu structure may be extended by the statistics to grow larger than one cacheline which will increase the cache footprint of SLUB. There is a compile option to enable/disable the inclusion of the runtime statistics and its off by default. The slabinfo tool is enhanced to support these statistics via two options: -D Switches the line of information displayed for a slab from size mode to activity mode. -A Sorts the slabs displayed by activity. This allows the display of the slabs most important to the performance of a certain load. -r Report option will report detailed statistics on Example (tbench load): slabinfo -AD ->Shows the most active slabs Name Objects Alloc Free %Fast skbuff_fclone_cache 33 111953835 111953835 99 99 :0000192 2666 5283688 5281047 99 99 :0001024 849 5247230 5246389 83 83 vm_area_struct 1349 119642 118355 91 22 :0004096 15 66753 66751 98 98 :0000064 2067 25297 23383 98 78 dentry 10259 28635 18464 91 45 :0000080 11004 18950 8089 98 98 :0000096 1703 12358 10784 99 98 :0000128 762 10582 9875 94 18 :0000512 184 9807 9647 95 81 :0002048 479 9669 9195 83 65 anon_vma 777 9461 9002 99 71 kmalloc-8 6492 9981 5624 99 97 :0000768 258 7174 6931 58 15 So the skbuff_fclone_cache is of highest importance for the tbench load. Pretty high load on the 192 sized slab. Look for the aliases slabinfo -a | grep 000192 :0000192 <- xfs_btree_cur filp kmalloc-192 uid_cache tw_sock_TCP request_sock_TCPv6 tw_sock_TCPv6 skbuff_head_cache xfs_ili Likely skbuff_head_cache. Looking into the statistics of the skbuff_fclone_cache is possible through slabinfo skbuff_fclone_cache ->-r option implied if cache name is mentioned .... Usual output ... Slab Perf Counter Alloc Free %Al %Fr -------------------------------------------------- Fastpath 111953360 111946981 99 99 Slowpath 1044 7423 0 0 Page Alloc 272 264 0 0 Add partial 25 325 0 0 Remove partial 86 264 0 0 RemoteObj/SlabFrozen 350 4832 0 0 Total 111954404 111954404 Flushes 49 Refill 0 Deactivate Full=325(92%) Empty=0(0%) ToHead=24(6%) ToTail=1(0%) Looks good because the fastpath is overwhelmingly taken. skbuff_head_cache: Slab Perf Counter Alloc Free %Al %Fr -------------------------------------------------- Fastpath 5297262 5259882 99 99 Slowpath 4477 39586 0 0 Page Alloc 937 824 0 0 Add partial 0 2515 0 0 Remove partial 1691 824 0 0 RemoteObj/SlabFrozen 2621 9684 0 0 Total 5301739 5299468 Deactivate Full=2620(100%) Empty=0(0%) ToHead=0(0%) ToTail=0(0%) Descriptions of the output: Total: The total number of allocation and frees that occurred for a slab Fastpath: The number of allocations/frees that used the fastpath. Slowpath: Other allocations Page Alloc: Number of calls to the page allocator as a result of slowpath processing Add Partial: Number of slabs added to the partial list through free or alloc (occurs during cpuslab flushes) Remove Partial: Number of slabs removed from the partial list as a result of allocations retrieving a partial slab or by a free freeing the last object of a slab. RemoteObj/Froz: How many times were remotely freed object encountered when a slab was about to be deactivated. Frozen: How many times was free able to skip list processing because the slab was in use as the cpuslab of another processor. Flushes: Number of times the cpuslab was flushed on request (kmem_cache_shrink, may result from races in __slab_alloc) Refill: Number of times we were able to refill the cpuslab from remotely freed objects for the same slab. Deactivate: Statistics how slabs were deactivated. Shows how they were put onto the partial list. In general fastpath is very good. Slowpath without partial list processing is also desirable. Any touching of partial list uses node specific locks which may potentially cause list lock contention. Signed-off-by: NChristoph Lameter <clameter@sgi.com>
-
由 Christoph Lameter 提交于
Provide an alternate implementation of the SLUB fast paths for alloc and free using cmpxchg_local. The cmpxchg_local fast path is selected for arches that have CONFIG_FAST_CMPXCHG_LOCAL set. An arch should only set CONFIG_FAST_CMPXCHG_LOCAL if the cmpxchg_local is faster than an interrupt enable/disable sequence. This is known to be true for both x86 platforms so set FAST_CMPXCHG_LOCAL for both arches. Currently another requirement for the fastpath is that the kernel is compiled without preemption. The restriction will go away with the introduction of a new per cpu allocator and new per cpu operations. The advantages of a cmpxchg_local based fast path are: 1. Potentially lower cycle count (30%-60% faster) 2. There is no need to disable and enable interrupts on the fast path. Currently interrupts have to be disabled and enabled on every slab operation. This is likely avoiding a significant percentage of interrupt off / on sequences in the kernel. 3. The disposal of freed slabs can occur with interrupts enabled. The alternate path is realized using #ifdef's. Several attempts to do the same with macros and inline functions resulted in a mess (in particular due to the strange way that local_interrupt_save() handles its argument and due to the need to define macros/functions that sometimes disable interrupts and sometimes do something else). [clameter: Stripped preempt bits and disabled fastpath if preempt is enabled] Signed-off-by: NChristoph Lameter <clameter@sgi.com> Reviewed-by: NPekka Enberg <penberg@cs.helsinki.fi> Cc: <linux-arch@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
-
由 Christoph Lameter 提交于
We use a NULL pointer on freelists to signal that there are no more objects. However the NULL pointers of all slabs match in contrast to the pointers to the real objects which are in different ranges for different slab pages. Change the end pointer to be a pointer to the first object and set bit 0. Every slab will then have a different end pointer. This is necessary to ensure that end markers can be matched to the source slab during cmpxchg_local. Bring back the use of the mapping field by SLUB since we would otherwise have to call a relatively expensive function page_address() in __slab_alloc(). Use of the mapping field allows avoiding a call to page_address() in various other functions as well. There is no need to change the page_mapping() function since bit 0 is set on the mapping as also for anonymous pages. page_mapping(slab_page) will therefore still return NULL although the mapping field is overloaded. Signed-off-by: NChristoph Lameter <clameter@sgi.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
-
由 Christoph Lameter 提交于
gcc 4.2 spits out an annoying warning if one casts a const void * pointer to a void * pointer. No warning is generated if the conversion is done through an assignment. Signed-off-by: NChristoph Lameter <clameter@sgi.com>
-
- 05 2月, 2008 7 次提交
-
-
由 root 提交于
inconsistent {softirq-on-W} -> {in-softirq-W} usage. swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes: (&n->list_lock){-+..}, at: [<ffffffff802935c1>] add_partial+0x31/0xa0 {softirq-on-W} state was registered at: [<ffffffff80259fb8>] __lock_acquire+0x3e8/0x1140 [<ffffffff80259838>] debug_check_no_locks_freed+0x188/0x1a0 [<ffffffff8025ad65>] lock_acquire+0x55/0x70 [<ffffffff802935c1>] add_partial+0x31/0xa0 [<ffffffff805c76de>] _spin_lock+0x1e/0x30 [<ffffffff802935c1>] add_partial+0x31/0xa0 [<ffffffff80296f9c>] kmem_cache_open+0x1cc/0x330 [<ffffffff805c7984>] _spin_unlock_irq+0x24/0x30 [<ffffffff802974f4>] create_kmalloc_cache+0x64/0xf0 [<ffffffff80295640>] init_alloc_cpu_cpu+0x70/0x90 [<ffffffff8080ada5>] kmem_cache_init+0x65/0x1d0 [<ffffffff807f1b4e>] start_kernel+0x23e/0x350 [<ffffffff807f112d>] _sinittext+0x12d/0x140 [<ffffffffffffffff>] 0xffffffffffffffff This change isn't really necessary for correctness, but it prevents lockdep from getting upset and then disabling itself. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Christoph Lameter <clameter@sgi.com> Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NChristoph Lameter <clameter@sgi.com>
-
由 Pekka Enberg 提交于
This fixes most of the obvious coding style violations in mm/slub.c as reported by checkpatch. Acked-by: NChristoph Lameter <clameter@sgi.com> Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NChristoph Lameter <clameter@sgi.com>
-
由 Christoph Lameter 提交于
Add a parameter to add_partial instead of having separate functions. The parameter allows a more detailed control of where the slab pages is placed in the partial queues. If we put slabs back to the front then they are likely immediately used for allocations. If they are put at the end then we can maximize the time that the partial slabs spent without being subject to allocations. When deactivating slab we can put the slabs that had remote objects freed (we can see that because objects were put on the freelist that requires locks) to them at the end of the list so that the cachelines of remote processors can cool down. Slabs that had objects from the local cpu freed to them (objects exist in the lockless freelist) are put in the front of the list to be reused ASAP in order to exploit the cache hot state of the local cpu. Patch seems to slightly improve tbench speed (1-2%). Signed-off-by: NChristoph Lameter <clameter@sgi.com> Reviewed-by: NPekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
-
由 Christoph Lameter 提交于
The NUMA defrag works by allocating objects from partial slabs on remote nodes. Rename it to remote_node_defrag_ratio to be clear about this. Signed-off-by: NChristoph Lameter <clameter@sgi.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
-
由 Christoph Lameter 提交于
Move the counting function for objects in partial slabs so that it is placed before kmem_cache_shrink. Signed-off-by: NChristoph Lameter <clameter@sgi.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
-
由 Christoph Lameter 提交于
If CONFIG_SYSFS is set then free the kmem_cache structure when sysfs tells us its okay. Otherwise there is the danger (as pointed out by Al Viro) that sysfs thinks the kobject still exists after kmem_cache_destroy() removed it. Signed-off-by: NChristoph Lameter <clameter@sgi.com> Reviewed-by: NPekka J Enberg <penberg@cs.helsinki.fi>
-
由 Harvey Harrison 提交于
Introduce 'len' at outer level: mm/slub.c:3406:26: warning: symbol 'n' shadows an earlier one mm/slub.c:3393:6: originally declared here No need to declare new node: mm/slub.c:3501:7: warning: symbol 'node' shadows an earlier one mm/slub.c:3491:6: originally declared here No need to declare new x: mm/slub.c:3513:9: warning: symbol 'x' shadows an earlier one mm/slub.c:3492:6: originally declared here Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com> Signed-off-by: NChristoph Lameter <clameter@sgi.com>
-
- 25 1月, 2008 5 次提交
-
-
由 Greg Kroah-Hartman 提交于
This converts the code to use the new kobject functions, cleaning up the logic in doing so. Cc: Christoph Lameter <clameter@sgi.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Greg Kroah-Hartman 提交于
kernel_kset does not need to be a kset, but a much simpler kobject now that we have kobj_attributes. We also rename kernel_kset to kernel_kobj to catch all users of this symbol with a build error instead of an easy-to-ignore build warning. Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Greg Kroah-Hartman 提交于
/sys/kernel is where these things should go. Also updated the documentation and tool that used this directory. Cc: Kay Sievers <kay.sievers@vrfy.org> Acked-by: NChristoph Lameter <clameter@sgi.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Greg Kroah-Hartman 提交于
Dynamically create the kset instead of declaring it statically. Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Greg Kroah-Hartman 提交于
We don't need a "default" ktype for a kset. We should set this explicitly every time for each kset. This change is needed so that we can make ksets dynamic, and cleans up one of the odd, undocumented assumption that the kset/kobject/ktype model has. This patch is based on a lot of help from Kay Sievers. Nasty bug in the block code was found by Dave Young <hidave.darkstar@gmail.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
- 03 1月, 2008 1 次提交
-
-
由 Linus Torvalds 提交于
Both SLUB and SLAB really did almost exactly the same thing for /proc/slabinfo setup, using duplicate code and per-allocator #ifdef's. This just creates a common CONFIG_SLABINFO that is enabled by both SLUB and SLAB, and shares all the setup code. Maybe SLOB will want this some day too. Reviewed-by: NPekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 02 1月, 2008 1 次提交
-
-
由 Pekka J Enberg 提交于
This adds a read-only /proc/slabinfo file on SLUB, that makes slabtop work. [ mingo@elte.hu: build fix. ] Cc: Andi Kleen <andi@firstfloor.org> Cc: Christoph Lameter <clameter@sgi.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 12月, 2007 1 次提交
-
-
由 Christoph Lameter 提交于
Increase the mininum number of partial slabs to keep around and put partial slabs to the end of the partial queue so that they can add more objects. Signed-off-by: NChristoph Lameter <clameter@sgi.com> Reviewed-by: NPekka Enberg <penberg@cs.helsinki.fi> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 12月, 2007 1 次提交
-
-
由 Christoph Lameter 提交于
Remove a recently added useless masking of GFP_ZERO. GFP_ZERO is already masked out in new_slab() (See how it calls allocate_slab). No need to do it twice. This reverts the SLUB parts of 7fd27255. Cc: Matt Mackall <mpm@selenic.com> Reviewed-by: NPekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: NChristoph Lameter <clameter@sgi.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 12月, 2007 1 次提交
-
-
由 Linus Torvalds 提交于
Both slob and slub react to __GFP_ZERO by clearing the allocation, which means that passing the GFP_ZERO bit down to the page allocator is just wasteful and pointless. Acked-by: NMatt Mackall <mpm@selenic.com> Reviewed-by: NPekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 12月, 2007 1 次提交
-
-
由 Vegard Nossum 提交于
I can't pass memory allocated by kmalloc() to ksize() if it is allocated by SLUB allocator and size is larger than (I guess) PAGE_SIZE / 2. The error of ksize() seems to be that it does not check if the allocation was made by SLUB or the page allocator. Reviewed-by: NPekka Enberg <penberg@cs.helsinki.fi> Tested-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Christoph Lameter <clameter@sgi.com>, Matt Mackall <mpm@selenic.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 11月, 2007 1 次提交
-
-
由 Denis Cheng 提交于
Since the macro "for_each_object" introduced, the "end" variable becomes unused anymore. Signed-off-by: NDenis Cheng <crquan@gmail.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 11月, 2007 1 次提交
-
-
由 Christoph Lameter 提交于
Fix the memory leak that may occur when we attempt to reuse a cpu_slab that was allocated while we reenabled interrupts in order to be able to grow a slab cache. The per cpu freelist may contain objects and in that situation we may overwrite the per cpu freelist pointer loosing objects. This only occurs if we find that the concurrently allocated slab fits our allocation needs. If we simply always deactivate the slab then the freelist will be properly reintegrated and the memory leak will go away. Signed-off-by: NChristoph Lameter <clameter@sgi.com> Acked-by: NHugh Dickins <hugh@veritas.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 10月, 2007 1 次提交
-
-
由 Al Viro 提交于
nr_slabs is atomic_long_t, not atomic_t Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 10月, 2007 1 次提交
-
-
由 Yasunori Goto 提交于
Fix a panic due to access NULL pointer of kmem_cache_node at discard_slab() after memory online. When memory online is called, kmem_cache_nodes are created for all SLUBs for new node whose memory are available. slab_mem_going_online_callback() is called to make kmem_cache_node() in callback of memory online event. If it (or other callbacks) fails, then slab_mem_offline_callback() is called for rollback. In memory offline, slab_mem_going_offline_callback() is called to shrink all slub cache, then slab_mem_offline_callback() is called later. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: locking fix] [akpm@linux-foundation.org: build fix] Signed-off-by: NYasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 10月, 2007 12 次提交
-
-
由 Christoph Lameter 提交于
Slab constructors currently have a flags parameter that is never used. And the order of the arguments is opposite to other slab functions. The object pointer is placed before the kmem_cache pointer. Convert ctor(void *object, struct kmem_cache *s, unsigned long flags) to ctor(struct kmem_cache *s, void *object) throughout the kernel [akpm@linux-foundation.org: coupla fixes] Signed-off-by: NChristoph Lameter <clameter@sgi.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Lameter 提交于
Move irq handling out of new slab into __slab_alloc. That is useful for Mathieu's cmpxchg_local patchset and also allows us to remove the crude local_irq_off in early_kmem_cache_alloc(). Signed-off-by: NChristoph Lameter <clameter@sgi.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrew Morton 提交于
It's a short-lived allocation. Cc: Christoph Lameter <clameter@sgi.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Lameter 提交于
We touch a cacheline in the kmem_cache structure for zeroing to get the size. However, the hot paths in slab_alloc and slab_free do not reference any other fields in kmem_cache, so we may have to just bring in the cacheline for this one access. Add a new field to kmem_cache_cpu that contains the object size. That cacheline must already be used in the hotpaths. So we save one cacheline on every slab_alloc if we zero. We need to update the kmem_cache_cpu object size if an aliasing operation changes the objsize of an non debug slab. Signed-off-by: NChristoph Lameter <clameter@sgi.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Lameter 提交于
The kmem_cache_cpu structures introduced are currently an array placed in the kmem_cache struct. Meaning the kmem_cache_cpu structures are overwhelmingly on the wrong node for systems with a higher amount of nodes. These are performance critical structures since the per node information has to be touched for every alloc and free in a slab. In order to place the kmem_cache_cpu structure optimally we put an array of pointers to kmem_cache_cpu structs in kmem_cache (similar to SLAB). However, the kmem_cache_cpu structures can now be allocated in a more intelligent way. We would like to put per cpu structures for the same cpu but different slab caches in cachelines together to save space and decrease the cache footprint. However, the slab allocators itself control only allocations per node. We set up a simple per cpu array for every processor with 100 per cpu structures which is usually enough to get them all set up right. If we run out then we fall back to kmalloc_node. This also solves the bootstrap problem since we do not have to use slab allocator functions early in boot to get memory for the small per cpu structures. Pro: - NUMA aware placement improves memory performance - All global structures in struct kmem_cache become readonly - Dense packing of per cpu structures reduces cacheline footprint in SMP and NUMA. - Potential avoidance of exclusive cacheline fetches on the free and alloc hotpath since multiple kmem_cache_cpu structures are in one cacheline. This is particularly important for the kmalloc array. Cons: - Additional reference to one read only cacheline (per cpu array of pointers to kmem_cache_cpu) in both slab_alloc() and slab_free(). [akinobu.mita@gmail.com: fix cpu hotplug offline/online path] Signed-off-by: NChristoph Lameter <clameter@sgi.com> Cc: "Pekka Enberg" <penberg@cs.helsinki.fi> Cc: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Lameter 提交于
Set c->node to -1 if we allocate from a debug slab instead for SlabDebug which requires access the page struct cacheline. Signed-off-by: NChristoph Lameter <clameter@sgi.com> Tested-by: NAlexey Dobriyan <adobriyan@sw.ru> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Lameter 提交于
We need the offset from the page struct during slab_alloc and slab_free. In both cases we also reference the cacheline of the kmem_cache_cpu structure. We can therefore move the offset field into the kmem_cache_cpu structure freeing up 16 bits in the page struct. Moving the offset allows an allocation from slab_alloc() without touching the page struct in the hot path. The only thing left in slab_free() that touches the page struct cacheline for per cpu freeing is the checking of SlabDebug(page). The next patch deals with that. Use the available 16 bits to broaden page->inuse. More than 64k objects per slab become possible and we can get rid of the checks for that limitation. No need anymore to shrink the order of slabs if we boot with 2M sized slabs (slub_min_order=9). No need anymore to switch off the offset calculation for very large slabs since the field in the kmem_cache_cpu structure is 32 bits and so the offset field can now handle slab sizes of up to 8GB. Signed-off-by: NChristoph Lameter <clameter@sgi.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Lameter 提交于
After moving the lockless_freelist to kmem_cache_cpu we no longer need page->lockless_freelist. Restructure the use of the struct page fields in such a way that we never touch the mapping field. This is turn allows us to remove the special casing of SLUB when determining the mapping of a page (needed for corner cases of virtual caches machines that need to flush caches of processors mapping a page). Signed-off-by: NChristoph Lameter <clameter@sgi.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Lameter 提交于
A remote free may access the same page struct that also contains the lockless freelist for the cpu slab. If objects have a short lifetime and are freed by a different processor then remote frees back to the slab from which we are currently allocating are frequent. The cacheline with the page struct needs to be repeately acquired in exclusive mode by both the allocating thread and the freeing thread. If this is frequent enough then performance will suffer because of cacheline bouncing. This patchset puts the lockless_freelist pointer in its own cacheline. In order to make that happen we introduce a per cpu structure called kmem_cache_cpu. Instead of keeping an array of pointers to page structs we now keep an array to a per cpu structure that--among other things--contains the pointer to the lockless freelist. The freeing thread can then keep possession of exclusive access to the page struct cacheline while the allocating thread keeps its exclusive access to the cacheline containing the per cpu structure. This works as long as the allocating cpu is able to service its request from the lockless freelist. If the lockless freelist runs empty then the allocating thread needs to acquire exclusive access to the cacheline with the page struct lock the slab. The allocating thread will then check if new objects were freed to the per cpu slab. If so it will keep the slab as the cpu slab and continue with the recently remote freed objects. So the allocating thread can take a series of just freed remote pages and dish them out again. Ideally allocations could be just recycling objects in the same slab this way which will lead to an ideal allocation / remote free pattern. The number of objects that can be handled in this way is limited by the capacity of one slab. Increasing slab size via slub_min_objects/ slub_max_order may increase the number of objects and therefore performance. If the allocating thread runs out of objects and finds that no objects were put back by the remote processor then it will retrieve a new slab (from the partial lists or from the page allocator) and start with a whole new set of objects while the remote thread may still be freeing objects to the old cpu slab. This may then repeat until the new slab is also exhausted. If remote freeing has freed objects in the earlier slab then that earlier slab will now be on the partial freelist and the allocating thread will pick that slab next for allocation. So the loop is extended. However, both threads need to take the list_lock to make the swizzling via the partial list happen. It is likely that this kind of scheme will keep the objects being passed around to a small set that can be kept in the cpu caches leading to increased performance. More code cleanups become possible: - Instead of passing a cpu we can now pass a kmem_cache_cpu structure around. Allows reducing the number of parameters to various functions. - Can define a new node_match() function for NUMA to encapsulate locality checks. Effect on allocations: Cachelines touched before this patch: Write: page cache struct and first cacheline of object Cachelines touched after this patch: Write: kmem_cache_cpu cacheline and first cacheline of object Read: page cache struct (but see later patch that avoids touching that cacheline) The handling when the lockless alloc list runs empty gets to be a bit more complicated since another cacheline has now to be written to. But that is halfway out of the hot path. Effect on freeing: Cachelines touched before this patch: Write: page_struct and first cacheline of object Cachelines touched after this patch depending on how we free: Write(to cpu_slab): kmem_cache_cpu struct and first cacheline of object Write(to other): page struct and first cacheline of object Read(to cpu_slab): page struct to id slab etc. (but see later patch that avoids touching the page struct on free) Read(to other): cpu local kmem_cache_cpu struct to verify its not the cpu slab. Summary: Pro: - Distinct cachelines so that concurrent remote frees and local allocs on a cpuslab can occur without cacheline bouncing. - Avoids potential bouncing cachelines because of neighboring per cpu pointer updates in kmem_cache's cpu_slab structure since it now grows to a cacheline (Therefore remove the comment that talks about that concern). Cons: - Freeing objects now requires the reading of one additional cacheline. That can be mitigated for some cases by the following patches but its not possible to completely eliminate these references. - Memory usage grows slightly. The size of each per cpu object is blown up from one word (pointing to the page_struct) to one cacheline with various data. So this is NR_CPUS*NR_SLABS*L1_BYTES more memory use. Lets say NR_SLABS is 100 and a cache line size of 128 then we have just increased SLAB metadata requirements by 12.8k per cpu. (Another later patch reduces these requirements) Signed-off-by: NChristoph Lameter <clameter@sgi.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
This patch marks a number of allocations that are either short-lived such as network buffers or are reclaimable such as inode allocations. When something like updatedb is called, long-lived and unmovable kernel allocations tend to be spread throughout the address space which increases fragmentation. This patch groups these allocations together as much as possible by adding a new MIGRATE_TYPE. The MIGRATE_RECLAIMABLE type is for allocations that can be reclaimed on demand, but not moved. i.e. they can be migrated by deleting them and re-reading the information from elsewhere. Signed-off-by: NMel Gorman <mel@csn.ul.ie> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Lameter 提交于
The function of GFP_LEVEL_MASK seems to be unclear. In order to clear up the mystery we get rid of it and replace GFP_LEVEL_MASK with 3 sets of GFP flags: GFP_RECLAIM_MASK Flags used to control page allocator reclaim behavior. GFP_CONSTRAINT_MASK Flags used to limit where allocations can occur. GFP_SLAB_BUG_MASK Flags that the slab allocator BUG()s on. These replace the uses of GFP_LEVEL mask in the slab allocators and in vmalloc.c. The use of the flags not included in these sets may occur as a result of a slab allocation standing in for a page allocation when constructing scatter gather lists. Extraneous flags are cleared and not passed through to the page allocator. __GFP_MOVABLE/RECLAIMABLE, __GFP_COLD and __GFP_COMP will now be ignored if passed to a slab allocator. Change the allocation of allocator meta data in SLAB and vmalloc to not pass through flags listed in GFP_CONSTRAINT_MASK. SLAB already removes the __GFP_THISNODE flag for such allocations. Generalize that to also cover vmalloc. The use of GFP_CONSTRAINT_MASK also includes __GFP_HARDWALL. The impact of allocator metadata placement on access latency to the cachelines of the object itself is minimal since metadata is only referenced on alloc and free. The attempt is still made to place the meta data optimally but we consistently allow fallback both in SLAB and vmalloc (SLUB does not need to allocate metadata like that). Allocator metadata may serve multiple in kernel users and thus should not be subject to the limitations arising from a single allocation context. [akpm@linux-foundation.org: fix fallback_alloc()] Signed-off-by: NChristoph Lameter <clameter@sgi.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Lameter 提交于
Simply switch all for_each_online_node to for_each_node_state(NORMAL_MEMORY). That way SLUB only operates on nodes with regular memory. Any allocation attempt on a memoryless node or a node with just highmem will fall whereupon SLUB will fetch memory from a nearby node (depending on how memory policies and cpuset describe fallback). Signed-off-by: NChristoph Lameter <clameter@sgi.com> Tested-by: NLee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: NBob Picco <bob.picco@hp.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@skynet.ie> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-