- 27 1月, 2007 4 次提交
-
-
由 Trond Myklebust 提交于
NFS can handle the case where invalidate_inode_pages2_range() fails, so the premise behind commit 8258d4a5 is now gone. Remove the WARN_ON_ONCE() which is causing users grief as we can see from http://bugzilla.kernel.org/show_bug.cgi?id=7826Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Roland McGrath 提交于
This patch fixes core dumps to include the vDSO vma, which is left out now. It removes the special-case core writing macros, which were not doing the right thing for the vDSO vma anyway. Instead, it uses VM_ALWAYSDUMP in the vma; there is no need for the fixmap page to be installed. It handles the CONFIG_COMPAT_VDSO case by making elf_core_dump use the fake vma from get_gate_vma after real vmas in the same way the /proc/PID/maps code does. This changes core dumps so they no longer include the non-PT_LOAD phdrs from the vDSO. I made the change to add them in the first place, but in turned out that nothing ever wanted them there since the advent of NT_AUXV. It's cleaner to leave them out, and just let the phdrs inside the vDSO image speak for themselves. Signed-off-by: NRoland McGrath <roland@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Roland McGrath 提交于
This patch fixes the initialization of gate_vma.vm_flags and gate_vma.vm_page_prot to reflect reality. This makes the "[vdso]" line in /proc/PID/maps correctly show r-xp instead of ---p, when gate_vma is used (CONFIG_COMPAT_VDSO on i386). Signed-off-by: NRoland McGrath <roland@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
It's not pretty, but it appears that ext3 with data=journal will clean pages without ever actually telling the VM that they are clean. This, in turn, will result in the VM (and balance_dirty_pages() in particular) to never realize that the pages got cleaned, and wait forever for an event that already happened. Technically, this seems to be a problem with ext3 itself, but it used to be hidden by 'try_to_free_buffers()' noticing this situation on its own, and just working around the filesystem problem. This commit re-instates that hack, in order to avoid a regression for the 2.6.20 release. This fixes bugzilla 7844: http://bugzilla.kernel.org/show_bug.cgi?id=7844 Peter Zijlstra points out that we should probably retain the debugging code that this removes from cancel_dirty_page(), and I agree, but for the imminent release we might as well just silence the warning too (since it's not a new bug: anything that triggers that warning has been around forever). Acked-by: NRandy Dunlap <rdunlap@xenotime.net> Acked-by: NJens Axboe <jens.axboe@oracle.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 1月, 2007 1 次提交
-
-
由 Christoph Lameter 提交于
Currently one can specify an arbitrary node mask to mbind that includes nodes not allowed. If that is done with an interleave policy then we will go around all the nodes. Those outside of the currently allowed cpuset will be redirected to the border nodes. Interleave will then create imbalances at the borders of the cpuset. This patch restricts the nodes to the currently allowed cpuset. The RFC for this patch was discussed at http://marc.theaimsgroup.com/?t=116793842100004&r=1&w=2Signed-off-by: NChristoph Lameter <clameter@sgi.com> Cc: Paul Jackson <pj@sgi.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 1月, 2007 1 次提交
-
-
由 Jens Axboe 提交于
Currently we issue a bounce trace when __blk_queue_bounce() is called, but that merely means that the device has a lower dma mask than the higher pages in the system. The bio itself may still be lower pages. So move the bounce trace into __blk_queue_bounce(), when we know there will actually be page bouncing. Signed-off-by: NJens Axboe <jens.axboe@oracle.com> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 12 1月, 2007 2 次提交
-
-
由 Trond Myklebust 提交于
NFS: Fix race in nfs_release_page() invalidate_inode_pages2() may find the dirty bit has been set on a page owing to the fact that the page may still be mapped after it was locked. Only after the call to unmap_mapping_range() are we sure that the page can no longer be dirtied. In order to fix this, NFS has hooked the releasepage() method and tries to write the page out between the call to unmap_mapping_range() and the call to remove_mapping(). This, however leads to deadlocks in the page reclaim code, where the page may be locked without holding a reference to the inode or dentry. Fix is to add a new address_space_operation, launder_page(), which will attempt to write out a dirty page without releasing the page lock. Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Also, the bare SetPageDirty() can skew all sort of accounting leading to other nasties. [akpm@osdl.org: cleanup] Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Dave Hansen 提交于
Fix an oops experienced on the Cell architecture when init-time functions, early_*(), are called at runtime. It alters the call paths to make sure that the callers explicitly say whether the call is being made on behalf of a hotplug even, or happening at boot-time. It has been compile tested on ppc64, ia64, s390, i386 and x86_64. Acked-by: NArnd Bergmann <arndb@de.ibm.com> Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Acked-by: NAndy Whitcroft <apw@shadowen.org> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 09 1月, 2007 1 次提交
-
-
由 Russell King 提交于
Since get_user_pages() may be used with processes other than the current process and calls flush_anon_page(), flush_anon_page() has to cope in some way with non-current processes. It may not be appropriate, or even desirable to flush a region of virtual memory cache in the current process when that is different to the process that we want the flush to occur for. Therefore, pass the vma into flush_anon_page() so that the architecture can work out whether the 'vmaddr' is for the current process or not. Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
- 06 1月, 2007 6 次提交
-
-
由 Andrew Morton 提交于
At the end of shrink_all_memory() we forget to recalculate lru_pages: it can be zero. Fix that up, and add a helper function for this operation too. Also, recalculate lru_pages each time around the inner loop to get the balancing correct. Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Hugh Dickins 提交于
These days, if you swapoff when there isn't enough memory, OOM killer gives "BUG: scheduling while atomic" and the machine hangs: badness() needs to do its PF_SWAPOFF return after the task_unlock (tasklist_lock is also held here, so p isn't going to be freed: PF_SWAPOFF might get turned off at any moment, but that doesn't really matter). Signed-off-by: NHugh Dickins <hugh@veritas.com> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Christoph Lameter 提交于
Both process_zones() and drain_node_pages() check for populated zones before touching pagesets. However, __drain_pages does not do so, This may result in a NULL pointer dereference for pagesets in unpopulated zones if a NUMA setup is combined with cpu hotplug. Initially the unpopulated zone has the pcp pointers pointing to the boot pagesets. Since the zone is not populated the boot pageset pointers will not be changed during page allocator and slab bootstrap. If a cpu is later brought down (first call to __drain_pages()) then the pcp pointers for cpus in unpopulated zones are set to NULL since __drain_pages does not first check for an unpopulated zone. If the cpu is then brought up again then we call process_zones() which will ignore the unpopulated zone. So the pageset pointers will still be NULL. If the cpu is then again brought down then __drain_pages will attempt to drain pages by following the NULL pageset pointer for unpopulated zones. Signed-off-by: NChristoph Lameter <clameter@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Hugh Dickins 提交于
pdflush hit the BUG_ON(!PageSlab(page)) in kmem_freepages called from fallback_alloc: cache_grow already freed those pages when alloc_slabmgmt failed. But it wouldn't have freed them if __GFP_NO_GROW, so make sure fallback_alloc doesn't waste its time on that case. Signed-off-by: NHugh Dickins <hugh@veritas.com> Acked-by: NChristoph Lameter <clameter@sgi.com> Acked-by: NPekka J Enberg <penberg@cs.helsinki.fi> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Paul Mundt 提交于
At the moment the inode/dentry cache hash tables (common by way of alloc_large_system_hash()) are incorrectly sized by their respective detection logic when we attempt to use large base pages on systems with little memory. This results in odd behaviour when using a 64kB PAGE_SIZE, such as: Dentry cache hash table entries: 8192 (order: -1, 32768 bytes) Inode-cache hash table entries: 4096 (order: -2, 16384 bytes) The mount cache hash table is seemingly the only one that gets this right by directly taking PAGE_SIZE in to account. The following patch attempts to catch the bogus values and round it up to at least 0-order. Signed-off-by: NPaul Mundt <lethal@linux-sh.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Rafael J. Wysocki 提交于
In the kernels later than 2.6.19 there is a regression that makes swsusp fail if the resume device is not explicitly specified. It can be fixed by adding an additional parameter to mm/swapfile.c:swap_type_of() allowing us to pass the (struct block_device *) corresponding to the first available swap back to the caller. Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> Acked-by: NPavel Machek <pavel@ucw.cz> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 31 12月, 2006 4 次提交
-
-
由 Shantanu Goel 提交于
Fix a rather obvious buglet. Noticed while instrumenting the VM using /proc/vmstat. Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Al Viro 提交于
(akpm: macros are wonderful) Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Dimitri Gorokhovik 提交于
Recent cleanup of slab.h broke SLOB allocator: the routine kmem_cache_init has now the __init attribute for both slab.c and slob.c. This routine cannot be removed after init in the case of slob.c -- it serves as a timer callback. Provide a separate timer callback routine, call it once from kmem_cache_init, keep the __init attribute on the latter. Signed-off-by: NDimitri Gorokhovik <dimitri.gorokhovik@free.fr> Cc: Christoph Lameter <clameter@engr.sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 KAMEZAWA Hiroyuki 提交于
constrained_alloc(), which is called to detect where oom is from, checks passed zone_list(). If zone_list doesn't include all nodes, it thinks oom is from mempolicy. But there is memory-less-node. memory-less-node's zones are never included in zonelist[]. contstrained_alloc() should get memory_less_node into count. Otherwise, it always thinks 'oom is from mempolicy'. This means that current process dies at any time. This patch fix it. Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Paul Jackson <pj@sgi.com> Cc: Christoph Lameter <clameter@engr.sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 30 12月, 2006 1 次提交
-
-
由 Linus Torvalds 提交于
The VM layer (on the face of it, fairly reasonably) expected that when it does a ->writepage() call to the filesystem, it would write out the full page at that point in time. Especially since it had earlier marked the whole page dirty with "set_page_dirty()". But that isn't actually the case: ->writepage() does not actually write a page, it writes the parts of the page that have been explicitly marked dirty before, *and* that had not got written out for other reasons since the last time we told it they were dirty. That last caveat is the important one. Which _most_ of the time ends up being the whole page (since we had called "set_page_dirty()" on the page earlier), but if the filesystem had done any dirty flushing of its own (for example, to honor some internal write ordering guarantees), it might end up doing only a partial page IO (or none at all) when ->writepage() is actually called. That is the correct thing in general (since we actually often _want_ only the known-dirty parts of the page to be written out), but the shared dirty page handling had implicitly forgotten about these details, and had a number of cases where it was doing just the "->writepage()" part, without telling the low-level filesystem that the whole page might have been re-dirtied as part of being mapped writably into user space. Since most of the time the FS did actually write out the full page, we didn't notice this for a loong time, and this needed some really odd patterns to trigger. But it caused occasional corruption with rtorrent and with the Debian "apt" database, because both use shared mmaps to update the end result. This fixes it. Finally. After way too much hair-pulling. Acked-by: NNick Piggin <nickpiggin@yahoo.com.au> Acked-by: NMartin J. Bligh <mbligh@google.com> Acked-by: NMartin Michlmayr <tbm@cyrius.com> Acked-by: NMartin Johansson <martin@fatbob.nu> Acked-by: NIngo Molnar <mingo@elte.hu> Acked-by: NAndrei Popa <andrei.popa@i-neo.ro> Cc: High Dickins <hugh@veritas.com> Cc: Andrew Morton <akpm@osdl.org>, Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Gordon Farquharson <gordonfarquharson@gmail.com> Cc: Guillaume Chazarain <guichaz@yahoo.fr> Cc: Theodore Tso <tytso@mit.edu> Cc: Kenneth Cheng <kenneth.w.chen@intel.com> Cc: Tobias Diedrich <ranma@tdiedrich.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 24 12月, 2006 1 次提交
-
-
由 Linus Torvalds 提交于
Make cancel_dirty_page() act more like all the other dirty and writeback accounting functions: test for "mapping" being NULL, and do the NR_FILE_DIRY accounting purely based on mapping_cap_account_dirty()). Also, add it to the exports, so that modular filesystems can use it. Acked-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 23 12月, 2006 7 次提交
-
-
由 Peter Zijlstra 提交于
- add flush_cache_page() for all those virtual indexed cache architectures. - handle s390. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Nick Piggin 提交于
Add more debugging in the rmap code in an attempt to locate to source of the occasional "mapcount went negative" assertions. Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Nigel Cunningham 提交于
The version of mm/vmscan.c in Linus' current tree has swapped parameters in the shrink_all_zones declaration and call, used by the various suspend-to-disk implementations. This doesn't seem to have any great adverse effect, but it's clearly wrong. Signed-off-by: NNigel Cunningham <nigel@suspend2.net> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Randy Dunlap 提交于
Fix kernel-doc warnings in 2.6.20-rc1. Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Christoph Lameter 提交于
The declaration of kmem_ptr_validate in slab.h does not match the one in slab.c. Remove the fastcall attribute (this is the only use in slab.c). Signed-off-by: NChristoph Lameter <clameter@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Badari Pulavarty 提交于
Ran into BUG() while doing madvise(REMOVE) testing. If we are punching a hole into shared memory segment using madvise(REMOVE) and the entire hole is below the indirect blocks, we hit following assert. BUG_ON(limit <= SHMEM_NR_DIRECT); Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Andrew Morton 提交于
Only (un)account for IO and page-dirtying for devices which have real backing store (ie: not tmpfs or ramdisks). Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 22 12月, 2006 2 次提交
-
-
由 Andrew Morton 提交于
truncate presently invalidates the dirty page's buffer_heads then shoots down the page. But try_to_free_buffers() will now bale out because the page is dirty. Net effect: the LRU gets filled with dirty pages which have invalidated buffer_heads attached. They have no ->mapping and hence cannot be cleaned. The machine leaks memory at an enormous rate. Fix this by cleaning the page before running try_to_free_buffers(), so try_to_free_buffers() can do its work. Also, remember to do dirty-page-acoounting in cancel_dirty_page() so the machine won't wedge up trying to write non-existent dirty pages. Probably still wrong, but now less so. Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Linus Torvalds 提交于
They were horribly easy to mis-use because of their tempting naming, and they also did way more than any users of them generally wanted them to do. A dirty page can become clean under two circumstances: (a) when we write it out. We have "clear_page_dirty_for_io()" for this, and that function remains unchanged. In the "for IO" case it is not sufficient to just clear the dirty bit, you also have to mark the page as being under writeback etc. (b) when we actually remove a page due to it becoming inaccessible to users, notably because it was truncate()'d away or the file (or metadata) no longer exists, and we thus want to cancel any outstanding dirty state. For the (b) case, we now introduce "cancel_dirty_page()", which only touches the page state itself, and verifies that the page is not mapped (since cancelling writes on a mapped page would be actively wrong as it is still accessible to users). Some filesystems need to be fixed up for this: CIFS, FUSE, JFS, ReiserFS, XFS all use the old confusing functions, and will be fixed separately in subsequent commits (with some of them just removing the offending logic, and others using clear_page_dirty_for_io()). This was confirmed by Martin Michlmayr to fix the apt database corruption on ARM. Cc: Martin Michlmayr <tbm@cyrius.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Andrei Popa <andrei.popa@i-neo.ro> Cc: Andrew Morton <akpm@osdl.org> Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: Gordon Farquharson <gordonfarquharson@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 18 12月, 2006 1 次提交
-
-
由 Oleg Nesterov 提交于
fix a typo, sys_mincore() needs min(). Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru> Signed-off-by: NLinus "I'm a moron" Torvalds <torvalds@osdl.org>
-
- 17 12月, 2006 2 次提交
-
-
由 Linus Torvalds 提交于
Hugh Dickins correctly points out that mincore() is actually _supposed_ to fail on an unmapped hole in the user address space, rather than return valid ("empty") information about the hole. This just simplifies the problem further (I had been misled by our previous confusing and complicated way of doing mincore()). Also, in the unlikely situation that we can't allocate a temporary kernel buffer, we should actually return EAGAIN, not ENOMEM, to keep the "unmapped hole" and "allocation failure" error cases separate. Finally, add a comment about our stupid historical lack of support for anonymous mappings. I'll fix that if somebody reminds me after 2.6.20 is out. Acked-by: NHugh Dickins <hugh@veritas.com> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Linus Torvalds 提交于
Doug Chapman noticed that mincore() will doa "copy_to_user()" of the result while holding the mmap semaphore for reading, which is a big no-no. While a recursive read-lock on a semaphore in the case of a page fault happens to work, we don't actually allow them due to deadlock schenarios with writers due to fairness issues. Doug and Marcel sent in a patch to fix it, but I decided to just rewrite the mess instead - not just fixing the locking problem, but making the code smaller and (imho) much easier to understand. Cc: Doug Chapman <dchapman@redhat.com> Cc: Marcel Holtmann <holtmann@redhat.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Andrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 14 12月, 2006 6 次提交
-
-
由 Atsushi Nemoto 提交于
To allow a more effective copy_user_highpage() on certain architectures, a vma argument is added to the function and cow_user_page() allowing the implementation of these functions to check for the VM_EXEC bit. The main part of this patch was originally written by Ralf Baechle; Atushi Nemoto did the the debugging. Signed-off-by: NAtsushi Nemoto <anemo@mba.ocn.ne.jp> Signed-off-by: NRalf Baechle <ralf@linux-mips.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Eric Dumazet 提交于
When some objects are allocated by one CPU but freed by another CPU we can consume lot of cycles doing divides in obj_to_index(). (Typical load on a dual processor machine where network interrupts are handled by one particular CPU (allocating skbufs), and the other CPU is running the application (consuming and freeing skbufs)) Here on one production server (dual-core AMD Opteron 285), I noticed this divide took 1.20 % of CPU_CLK_UNHALTED events in kernel. But Opteron are quite modern cpus and the divide is much more expensive on oldest architectures : On a 200 MHz sparcv9 machine, the division takes 64 cycles instead of 1 cycle for a multiply. Doing some math, we can use a reciprocal multiplication instead of a divide. If we want to compute V = (A / B) (A and B being u32 quantities) we can instead use : V = ((u64)A * RECIPROCAL(B)) >> 32 ; where RECIPROCAL(B) is precalculated to ((1LL << 32) + (B - 1)) / B Note : I wrote pure C code for clarity. gcc output for i386 is not optimal but acceptable : mull 0x14(%ebx) mov %edx,%eax // part of the >> 32 xor %edx,%edx // useless mov %eax,(%esp) // could be avoided mov %edx,0x4(%esp) // useless mov (%esp),%ebx [akpm@osdl.org: small cleanups] Signed-off-by: NEric Dumazet <dada1@cosmosbay.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Paul Jackson 提交于
Elaborate the API for calling cpuset_zone_allowed(), so that users have to explicitly choose between the two variants: cpuset_zone_allowed_hardwall() cpuset_zone_allowed_softwall() Until now, whether or not you got the hardwall flavor depended solely on whether or not you or'd in the __GFP_HARDWALL gfp flag to the gfp_mask argument. If you didn't specify __GFP_HARDWALL, you implicitly got the softwall version. Unfortunately, this meant that users would end up with the softwall version without thinking about it. Since only the softwall version might sleep, this led to bugs with possible sleeping in interrupt context on more than one occassion. The hardwall version requires that the current tasks mems_allowed allows the node of the specified zone (or that you're in interrupt or that __GFP_THISNODE is set or that you're on a one cpuset system.) The softwall version, depending on the gfp_mask, might allow a node if it was allowed in the nearest enclusing cpuset marked mem_exclusive (which requires taking the cpuset lock 'callback_mutex' to evaluate.) This patch removes the cpuset_zone_allowed() call, and forces the caller to explicitly choose between the hardwall and the softwall case. If the caller wants the gfp_mask to determine this choice, they should (1) be sure they can sleep or that __GFP_HARDWALL is set, and (2) invoke the cpuset_zone_allowed_softwall() routine. This adds another 100 or 200 bytes to the kernel text space, due to the few lines of nearly duplicate code at the top of both cpuset_zone_allowed_* routines. It should save a few instructions executed for the calls that turned into calls of cpuset_zone_allowed_hardwall, thanks to not having to set (before the call) then check (within the call) the __GFP_HARDWALL flag. For the most critical call, from get_page_from_freelist(), the same instructions are executed as before -- the old cpuset_zone_allowed() routine it used to call is the same code as the cpuset_zone_allowed_softwall() routine that it calls now. Not a perfect win, but seems worth it, to reduce this chance of hitting a sleeping with irq off complaint again. Signed-off-by: NPaul Jackson <pj@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Christoph Lameter 提交于
More cleanups for slab.h 1. Remove tabs from weird locations as suggested by Pekka 2. Drop the check for NUMA and SLAB_DEBUG from the fallback section as suggested by Pekka. 3. Uses static inline for the fallback defs as also suggested by Pekka. 4. Make kmem_ptr_valid take a const * argument. 5. Separate the NUMA fallback definitions from the kmalloc_track fallback definitions. Signed-off-by: NChristoph Lameter <clameter@sgi.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Christoph Lameter 提交于
This is a response to an earlier discussion on linux-mm about splitting slab.h components per allocator. Patch is against 2.6.19-git11. See http://marc.theaimsgroup.com/?l=linux-mm&m=116469577431008&w=2 This patch cleans up the slab header definitions. We define the common functions of slob and slab in slab.h and put the extra definitions needed for slab's kmalloc implementations in <linux/slab_def.h>. In order to get a greater set of common functions we add several empty functions to slob.c and also rename slob's kmalloc to __kmalloc. Slob does not need any special definitions since we introduce a fallback case. If there is no need for a slab implementation to provide its own kmalloc mess^H^H^Hacros then we simply fall back to __kmalloc functions. That is sufficient for SLOB. Sort the function in slab.h according to their functionality. First the functions operating on struct kmem_cache * then the kmalloc related functions followed by special debug and fallback definitions. Also redo a lot of comments. Signed-off-by: Christoph Lameter <clameter@sgi.com>? Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Christoph Lameter 提交于
Fallback_alloc() does not do the check for GFP_WAIT as done in cache_grow(). Thus interrupts are disabled when we call kmem_getpages() which results in the failure. Duplicate the handling of GFP_WAIT in cache_grow(). Signed-off-by: NChristoph Lameter <clameter@sgi.com> Cc: Jay Cliburn <jacliburn@bellsouth.net> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 11 12月, 2006 1 次提交
-
-
由 Arjan van de Ven 提交于
This patch introduces users of the round_jiffies() function in the slab code. The slab code has a few "run every second" timers for background work; these are obviously not timing critical as long as they happen roughly at the right frequency. Signed-off-by: NArjan van de Ven <arjan@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-