提交 · 5263bf65d6342e12ab716db8e529501670979321 · OpenHarmony / kernel_linux

27 1月, 2007 4 次提交

[PATCH] MM: Remove [PATCH] invalidate_inode_pages2_range() debug · 569d3287

由 Trond Myklebust 提交于 1月 26, 2007

NFS can handle the case where invalidate_inode_pages2_range() fails, so the
premise behind commit 8258d4a5 is now gone.

Remove the WARN_ON_ONCE() which is causing users grief as we can see from
http://bugzilla.kernel.org/show_bug.cgi?id=7826Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

569d3287

[PATCH] i386 vDSO: use VM_ALWAYSDUMP · f47aef55

由 Roland McGrath 提交于 1月 26, 2007

This patch fixes core dumps to include the vDSO vma, which is left out now.
It removes the special-case core writing macros, which were not doing the
right thing for the vDSO vma anyway.  Instead, it uses VM_ALWAYSDUMP in the
vma; there is no need for the fixmap page to be installed.  It handles the
CONFIG_COMPAT_VDSO case by making elf_core_dump use the fake vma from
get_gate_vma after real vmas in the same way the /proc/PID/maps code does.

This changes core dumps so they no longer include the non-PT_LOAD phdrs from
the vDSO.  I made the change to add them in the first place, but in turned out
that nothing ever wanted them there since the advent of NT_AUXV.  It's cleaner
to leave them out, and just let the phdrs inside the vDSO image speak for
themselves.
Signed-off-by: NRoland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f47aef55

[PATCH] Fix gate_vma.vm_flags · b6558c4a

由 Roland McGrath 提交于 1月 26, 2007

This patch fixes the initialization of gate_vma.vm_flags and
gate_vma.vm_page_prot to reflect reality.  This makes the "[vdso]" line in
/proc/PID/maps correctly show r-xp instead of ---p, when gate_vma is used
(CONFIG_COMPAT_VDSO on i386).
Signed-off-by: NRoland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b6558c4a

Resurrect 'try_to_free_buffers()' VM hackery · ecdfc978

由 Linus Torvalds 提交于 1月 26, 2007

It's not pretty, but it appears that ext3 with data=journal will clean
pages without ever actually telling the VM that they are clean.  This,
in turn, will result in the VM (and balance_dirty_pages() in particular)
to never realize that the pages got cleaned, and wait forever for an
event that already happened.

Technically, this seems to be a problem with ext3 itself, but it used to
be hidden by 'try_to_free_buffers()' noticing this situation on its own,
and just working around the filesystem problem.

This commit re-instates that hack, in order to avoid a regression for
the 2.6.20 release. This fixes bugzilla 7844:

	http://bugzilla.kernel.org/show_bug.cgi?id=7844

Peter Zijlstra points out that we should probably retain the debugging
code that this removes from cancel_dirty_page(), and I agree, but for
the imminent release we might as well just silence the warning too
(since it's not a new bug: anything that triggers that warning has been
around forever).
Acked-by: NRandy Dunlap <rdunlap@xenotime.net>
Acked-by: NJens Axboe <jens.axboe@oracle.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ecdfc978

23 1月, 2007 1 次提交

[PATCH] mbind: restrict nodes to the currently allowed cpuset · 30150f8d

由 Christoph Lameter 提交于 1月 22, 2007

Currently one can specify an arbitrary node mask to mbind that includes
nodes not allowed.  If that is done with an interleave policy then we will
go around all the nodes.  Those outside of the currently allowed cpuset
will be redirected to the border nodes.  Interleave will then create
imbalances at the borders of the cpuset.

This patch restricts the nodes to the currently allowed cpuset.

The RFC for this patch was discussed at
http://marc.theaimsgroup.com/?t=116793842100004&r=1&w=2Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

30150f8d

13 1月, 2007 1 次提交

[PATCH] blktrace: only add a bounce trace when we really bounce · c43a5082

由 Jens Axboe 提交于 1月 12, 2007

Currently we issue a bounce trace when __blk_queue_bounce() is called,
but that merely means that the device has a lower dma mask than the
higher pages in the system. The bio itself may still be lower pages. So
move the bounce trace into __blk_queue_bounce(), when we know there will
actually be page bouncing.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

c43a5082

12 1月, 2007 2 次提交

[PATCH] NFS: Fix race in nfs_release_page() · e3db7691

由 Trond Myklebust 提交于 1月 10, 2007

    NFS: Fix race in nfs_release_page()

    invalidate_inode_pages2() may find the dirty bit has been set on a page
    owing to the fact that the page may still be mapped after it was locked.
    Only after the call to unmap_mapping_range() are we sure that the page
    can no longer be dirtied.
    In order to fix this, NFS has hooked the releasepage() method and tries
    to write the page out between the call to unmap_mapping_range() and the
    call to remove_mapping(). This, however leads to deadlocks in the page
    reclaim code, where the page may be locked without holding a reference
    to the inode or dentry.

    Fix is to add a new address_space_operation, launder_page(), which will
    attempt to write out a dirty page without releasing the page lock.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

    Also, the bare SetPageDirty() can skew all sort of accounting leading to
    other nasties.

[akpm@osdl.org: cleanup]
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

e3db7691

[PATCH] Fix sparsemem on Cell · a2f3aa02

由 Dave Hansen 提交于 1月 10, 2007

Fix an oops experienced on the Cell architecture when init-time functions,
early_*(), are called at runtime.  It alters the call paths to make sure
that the callers explicitly say whether the call is being made on behalf of
a hotplug even, or happening at boot-time.

It has been compile tested on ppc64, ia64, s390, i386 and x86_64.
Acked-by: NArnd Bergmann <arndb@de.ibm.com>
Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Acked-by: NAndy Whitcroft <apw@shadowen.org>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a2f3aa02

09 1月, 2007 1 次提交

[ARM] pass vma for flush_anon_page() · a6f36be3

由 Russell King 提交于 12月 30, 2006

Since get_user_pages() may be used with processes other than the
current process and calls flush_anon_page(), flush_anon_page() has to
cope in some way with non-current processes.

It may not be appropriate, or even desirable to flush a region of
virtual memory cache in the current process when that is different to
the process that we want the flush to occur for.

Therefore, pass the vma into flush_anon_page() so that the architecture
can work out whether the 'vmaddr' is for the current process or not.
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

a6f36be3

06 1月, 2007 6 次提交

[PATCH] shrink_all_memory(): fix lru_pages handling · 76395d37

由 Andrew Morton 提交于 1月 05, 2007

At the end of shrink_all_memory() we forget to recalculate lru_pages: it can
be zero.

Fix that up, and add a helper function for this operation too.

Also, recalculate lru_pages each time around the inner loop to get the
balancing correct.

Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

76395d37

[PATCH] fix OOM killing of swapoff · 7ba34859

由 Hugh Dickins 提交于 1月 05, 2007

These days, if you swapoff when there isn't enough memory, OOM killer gives
"BUG: scheduling while atomic" and the machine hangs: badness() needs to do
its PF_SWAPOFF return after the task_unlock (tasklist_lock is also held
here, so p isn't going to be freed: PF_SWAPOFF might get turned off at any
moment, but that doesn't really matter).
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

7ba34859

[PATCH] Check for populated zone in __drain_pages · f2e12bb2

由 Christoph Lameter 提交于 1月 05, 2007

Both process_zones() and drain_node_pages() check for populated zones
before touching pagesets. However, __drain_pages does not do so,

This may result in a NULL pointer dereference for pagesets in unpopulated
zones if a NUMA setup is combined with cpu hotplug.

Initially the unpopulated zone has the pcp pointers pointing to the boot
pagesets. Since the zone is not populated the boot pageset pointers will
not be changed during page allocator and slab bootstrap.

If a cpu is later brought down (first call to __drain_pages()) then the pcp
pointers for cpus in unpopulated zones are set to NULL since __drain_pages
does not first check for an unpopulated zone.

If the cpu is then brought up again then we call process_zones() which will
ignore the unpopulated zone. So the pageset pointers will still be NULL.

If the cpu is then again brought down then __drain_pages will attempt to
drain pages by following the NULL pageset pointer for unpopulated zones.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

f2e12bb2

[PATCH] fix BUG_ON(!PageSlab) from fallback_alloc · b6a60451

由 Hugh Dickins 提交于 1月 05, 2007

pdflush hit the BUG_ON(!PageSlab(page)) in kmem_freepages called from
fallback_alloc: cache_grow already freed those pages when alloc_slabmgmt
failed.  But it wouldn't have freed them if __GFP_NO_GROW, so make sure
fallback_alloc doesn't waste its time on that case.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Acked-by: NChristoph Lameter <clameter@sgi.com>
Acked-by: NPekka J Enberg <penberg@cs.helsinki.fi>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

b6a60451

[PATCH] Sanely size hash tables when using large base pages · 9ab37b8f

由 Paul Mundt 提交于 1月 05, 2007

At the moment the inode/dentry cache hash tables (common by way of
alloc_large_system_hash()) are incorrectly sized by their respective
detection logic when we attempt to use large base pages on systems with
little memory.

This results in odd behaviour when using a 64kB PAGE_SIZE, such as:

Dentry cache hash table entries: 8192 (order: -1, 32768 bytes)
Inode-cache hash table entries: 4096 (order: -2, 16384 bytes)

The mount cache hash table is seemingly the only one that gets this right
by directly taking PAGE_SIZE in to account.

The following patch attempts to catch the bogus values and round it up to
at least 0-order.
Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

9ab37b8f

[PATCH] swsusp: Do not fail if resume device is not set · 7bf23687

由 Rafael J. Wysocki 提交于 1月 05, 2007

In the kernels later than 2.6.19 there is a regression that makes swsusp
fail if the resume device is not explicitly specified.

It can be fixed by adding an additional parameter to
mm/swapfile.c:swap_type_of() allowing us to pass the (struct block_device
*) corresponding to the first available swap back to the caller.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Acked-by: NPavel Machek <pavel@ucw.cz>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

7bf23687

31 12月, 2006 4 次提交

[PATCH] Buglet in vmscan.c · 918d3f90

由 Shantanu Goel 提交于 12月 29, 2006

Fix a rather obvious buglet.  Noticed while instrumenting the VM using
/proc/vmstat.

Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

918d3f90

[PATCH] page_mkclean_one(): fix call to set_pte_at() · d6e88e67

由 Al Viro 提交于 12月 29, 2006

(akpm: macros are wonderful)
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

d6e88e67

[PATCH] MM: SLOB is broken by recent cleanup of slab.h · bcb4ddb4

由 Dimitri Gorokhovik 提交于 12月 29, 2006

Recent cleanup of slab.h broke SLOB allocator: the routine kmem_cache_init
has now the __init attribute for both slab.c and slob.c.  This routine
cannot be removed after init in the case of slob.c -- it serves as a timer
callback.

Provide a separate timer callback routine, call it once from kmem_cache_init,
keep the __init attribute on the latter.
Signed-off-by: NDimitri Gorokhovik <dimitri.gorokhovik@free.fr>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

bcb4ddb4

[PATCH] fix oom killer kills current every time if there is memory-less-node take2 · 96ac5913

由 KAMEZAWA Hiroyuki 提交于 12月 29, 2006

constrained_alloc(), which is called to detect where oom is from, checks
passed zone_list().  If zone_list doesn't include all nodes, it thinks oom
is from mempolicy.

But there is memory-less-node.  memory-less-node's zones are never included
in zonelist[].

contstrained_alloc() should get memory_less_node into count.  Otherwise, it
always thinks 'oom is from mempolicy'.  This means that current process
dies at any time.  This patch fix it.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

96ac5913

30 12月, 2006 1 次提交

VM: Fix nasty and subtle race in shared mmap'ed page writeback · 7658cc28

由 Linus Torvalds 提交于 12月 29, 2006

The VM layer (on the face of it, fairly reasonably) expected that when
it does a ->writepage() call to the filesystem, it would write out the
full page at that point in time.  Especially since it had earlier marked
the whole page dirty with "set_page_dirty()".

But that isn't actually the case: ->writepage() does not actually write
a page, it writes the parts of the page that have been explicitly marked
dirty before, *and* that had not got written out for other reasons since
the last time we told it they were dirty.

That last caveat is the important one.

Which _most_ of the time ends up being the whole page (since we had
called "set_page_dirty()" on the page earlier), but if the filesystem
had done any dirty flushing of its own (for example, to honor some
internal write ordering guarantees), it might end up doing only a
partial page IO (or none at all) when ->writepage() is actually called.

That is the correct thing in general (since we actually often _want_
only the known-dirty parts of the page to be written out), but the
shared dirty page handling had implicitly forgotten about these details,
and had a number of cases where it was doing just the "->writepage()"
part, without telling the low-level filesystem that the whole page might
have been re-dirtied as part of being mapped writably into user space.

Since most of the time the FS did actually write out the full page, we
didn't notice this for a loong time, and this needed some really odd
patterns to trigger.  But it caused occasional corruption with rtorrent
and with the Debian "apt" database, because both use shared mmaps to
update the end result.

This fixes it. Finally. After way too much hair-pulling.
Acked-by: NNick Piggin <nickpiggin@yahoo.com.au>
Acked-by: NMartin J. Bligh <mbligh@google.com>
Acked-by: NMartin Michlmayr <tbm@cyrius.com>
Acked-by: NMartin Johansson <martin@fatbob.nu>
Acked-by: NIngo Molnar <mingo@elte.hu>
Acked-by: NAndrei Popa <andrei.popa@i-neo.ro>
Cc: High Dickins <hugh@veritas.com>
Cc: Andrew Morton <akpm@osdl.org>,
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: David Miller <davem@davemloft.net>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Gordon Farquharson <gordonfarquharson@gmail.com>
Cc: Guillaume Chazarain <guichaz@yahoo.fr>
Cc: Theodore Tso <tytso@mit.edu>
Cc: Kenneth Cheng <kenneth.w.chen@intel.com>
Cc: Tobias Diedrich <ranma@tdiedrich.de>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

7658cc28

24 12月, 2006 1 次提交

Clean up and export cancel_dirty_page() to modules · 8368e328

由 Linus Torvalds 提交于 12月 23, 2006

Make cancel_dirty_page() act more like all the other dirty and writeback
accounting functions: test for "mapping" being NULL, and do the
NR_FILE_DIRY accounting purely based on mapping_cap_account_dirty()).

Also, add it to the exports, so that modular filesystems can use it.
Acked-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

8368e328

23 12月, 2006 7 次提交

[PATCH] Fix up page_mkclean_one(): virtual caches, s390 · c2fda5fe

由 Peter Zijlstra 提交于 12月 22, 2006

 - add flush_cache_page() for all those virtual indexed cache
   architectures.

 - handle s390.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

c2fda5fe

[PATCH] mm: more rmap debugging · 7de6b805

由 Nick Piggin 提交于 12月 22, 2006

Add more debugging in the rmap code in an attempt to locate to source of
the occasional "mapcount went negative" assertions.

Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

7de6b805

[PATCH] Fix swapped parameters in mm/vmscan.c · e07aa05b

由 Nigel Cunningham 提交于 12月 22, 2006

The version of mm/vmscan.c in Linus' current tree has swapped parameters in
the shrink_all_zones declaration and call, used by the various
suspend-to-disk implementations.  This doesn't seem to have any great
adverse effect, but it's clearly wrong.
Signed-off-by: NNigel Cunningham <nigel@suspend2.net>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

e07aa05b

[PATCH] fix kernel-doc warnings in 2.6.20-rc1 · af9997e4

由 Randy Dunlap 提交于 12月 22, 2006

Fix kernel-doc warnings in 2.6.20-rc1.
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

af9997e4

[PATCH] slab: fix kmem_ptr_validate definition · b7f869a2

由 Christoph Lameter 提交于 12月 22, 2006

The declaration of kmem_ptr_validate in slab.h does not match the
one in slab.c. Remove the fastcall attribute (this is the only use in
slab.c).
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

b7f869a2

[PATCH] Fix for shmem_truncate_range() BUG_ON() · 92a3d03a

由 Badari Pulavarty 提交于 12月 22, 2006

Ran into BUG() while doing madvise(REMOVE) testing.  If we are punching a
hole into shared memory segment using madvise(REMOVE) and the entire hole
is below the indirect blocks, we hit following assert.

	        BUG_ON(limit <= SHMEM_NR_DIRECT);
Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

92a3d03a

[PATCH] truncate: dirty memory accounting fix · 5f2a105d

由 Andrew Morton 提交于 12月 22, 2006

Only (un)account for IO and page-dirtying for devices which have real backing
store (ie: not tmpfs or ramdisks).

Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

5f2a105d

22 12月, 2006 2 次提交

[PATCH] truncate: clear page dirtiness before running try_to_free_buffers() · 3e67c098

由 Andrew Morton 提交于 12月 21, 2006

truncate presently invalidates the dirty page's buffer_heads then shoots down
the page.  But try_to_free_buffers() will now bale out because the page is
dirty.

Net effect: the LRU gets filled with dirty pages which have invalidated
buffer_heads attached.  They have no ->mapping and hence cannot be cleaned.
The machine leaks memory at an enormous rate.

Fix this by cleaning the page before running try_to_free_buffers(), so
try_to_free_buffers() can do its work.

Also, remember to do dirty-page-acoounting in cancel_dirty_page() so the
machine won't wedge up trying to write non-existent dirty pages.

Probably still wrong, but now less so.
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

3e67c098

VM: Remove "clear_page_dirty()" and "test_clear_page_dirty()" functions · fba2591b

由 Linus Torvalds 提交于 12月 20, 2006

They were horribly easy to mis-use because of their tempting naming, and
they also did way more than any users of them generally wanted them to
do.

A dirty page can become clean under two circumstances:

 (a) when we write it out.  We have "clear_page_dirty_for_io()" for
     this, and that function remains unchanged.

     In the "for IO" case it is not sufficient to just clear the dirty
     bit, you also have to mark the page as being under writeback etc.

 (b) when we actually remove a page due to it becoming inaccessible to
     users, notably because it was truncate()'d away or the file (or
     metadata) no longer exists, and we thus want to cancel any
     outstanding dirty state.

For the (b) case, we now introduce "cancel_dirty_page()", which only
touches the page state itself, and verifies that the page is not mapped
(since cancelling writes on a mapped page would be actively wrong as it
is still accessible to users).

Some filesystems need to be fixed up for this: CIFS, FUSE, JFS,
ReiserFS, XFS all use the old confusing functions, and will be fixed
separately in subsequent commits (with some of them just removing the
offending logic, and others using clear_page_dirty_for_io()).

This was confirmed by Martin Michlmayr to fix the apt database
corruption on ARM.

Cc: Martin Michlmayr <tbm@cyrius.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Andrei Popa <andrei.popa@i-neo.ro>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Gordon Farquharson <gordonfarquharson@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

fba2591b

18 12月, 2006 1 次提交

[PATCH] sys_mincore: s/max/min/ · 825020c3

由 Oleg Nesterov 提交于 12月 17, 2006

fix a typo, sys_mincore() needs min().
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NLinus "I'm a moron" Torvalds <torvalds@osdl.org>

825020c3

17 12月, 2006 2 次提交

Fix up mm/mincore.c error value cases · 4fb23e43

由 Linus Torvalds 提交于 12月 16, 2006

Hugh Dickins correctly points out that mincore() is actually _supposed_
to fail on an unmapped hole in the user address space, rather than
return valid ("empty") information about the hole.  This just simplifies
the problem further (I had been misled by our previous confusing and
complicated way of doing mincore()).

Also, in the unlikely situation that we can't allocate a temporary
kernel buffer, we should actually return EAGAIN, not ENOMEM, to keep the
"unmapped hole" and "allocation failure" error cases separate.

Finally, add a comment about our stupid historical lack of support for
anonymous mappings.  I'll fix that if somebody reminds me after 2.6.20
is out.
Acked-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

4fb23e43

Fix incorrect user space access locking in mincore() · 2f77d107

由 Linus Torvalds 提交于 12月 16, 2006

Doug Chapman noticed that mincore() will doa "copy_to_user()" of the
result while holding the mmap semaphore for reading, which is a big
no-no.  While a recursive read-lock on a semaphore in the case of a page
fault happens to work, we don't actually allow them due to deadlock
schenarios with writers due to fairness issues.

Doug and Marcel sent in a patch to fix it, but I decided to just rewrite
the mess instead - not just fixing the locking problem, but making the
code smaller and (imho) much easier to understand.

Cc: Doug Chapman <dchapman@redhat.com>
Cc: Marcel Holtmann <holtmann@redhat.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

2f77d107

14 12月, 2006 6 次提交

[PATCH] Pass vma argument to copy_user_highpage(). · 9de455b2

由 Atsushi Nemoto 提交于 12月 12, 2006

To allow a more effective copy_user_highpage() on certain architectures,
a vma argument is added to the function and cow_user_page() allowing
the implementation of these functions to check for the VM_EXEC bit.

The main part of this patch was originally written by Ralf Baechle;
Atushi Nemoto did the the debugging.
Signed-off-by: NAtsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

9de455b2

[PATCH] SLAB: use a multiply instead of a divide in obj_to_index() · 6a2d7a95

由 Eric Dumazet 提交于 12月 13, 2006

When some objects are allocated by one CPU but freed by another CPU we can
consume lot of cycles doing divides in obj_to_index().

(Typical load on a dual processor machine where network interrupts are
handled by one particular CPU (allocating skbufs), and the other CPU is
running the application (consuming and freeing skbufs))

Here on one production server (dual-core AMD Opteron 285), I noticed this
divide took 1.20 % of CPU_CLK_UNHALTED events in kernel.  But Opteron are
quite modern cpus and the divide is much more expensive on oldest
architectures :

On a 200 MHz sparcv9 machine, the division takes 64 cycles instead of 1
cycle for a multiply.

Doing some math, we can use a reciprocal multiplication instead of a divide.

If we want to compute V = (A / B)  (A and B being u32 quantities)
we can instead use :

V = ((u64)A * RECIPROCAL(B)) >> 32 ;

where RECIPROCAL(B) is precalculated to ((1LL << 32) + (B - 1)) / B

Note :

I wrote pure C code for clarity. gcc output for i386 is not optimal but
acceptable :

mull   0x14(%ebx)
mov    %edx,%eax // part of the >> 32
xor     %edx,%edx // useless
mov    %eax,(%esp) // could be avoided
mov    %edx,0x4(%esp) // useless
mov    (%esp),%ebx

[akpm@osdl.org: small cleanups]
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

6a2d7a95

[PATCH] cpuset: rework cpuset_zone_allowed api · 02a0e53d

由 Paul Jackson 提交于 12月 13, 2006

Elaborate the API for calling cpuset_zone_allowed(), so that users have to
explicitly choose between the two variants:

  cpuset_zone_allowed_hardwall()
  cpuset_zone_allowed_softwall()

Until now, whether or not you got the hardwall flavor depended solely on
whether or not you or'd in the __GFP_HARDWALL gfp flag to the gfp_mask
argument.

If you didn't specify __GFP_HARDWALL, you implicitly got the softwall
version.

Unfortunately, this meant that users would end up with the softwall version
without thinking about it.  Since only the softwall version might sleep,
this led to bugs with possible sleeping in interrupt context on more than
one occassion.

The hardwall version requires that the current tasks mems_allowed allows
the node of the specified zone (or that you're in interrupt or that
__GFP_THISNODE is set or that you're on a one cpuset system.)

The softwall version, depending on the gfp_mask, might allow a node if it
was allowed in the nearest enclusing cpuset marked mem_exclusive (which
requires taking the cpuset lock 'callback_mutex' to evaluate.)

This patch removes the cpuset_zone_allowed() call, and forces the caller to
explicitly choose between the hardwall and the softwall case.

If the caller wants the gfp_mask to determine this choice, they should (1)
be sure they can sleep or that __GFP_HARDWALL is set, and (2) invoke the
cpuset_zone_allowed_softwall() routine.

This adds another 100 or 200 bytes to the kernel text space, due to the few
lines of nearly duplicate code at the top of both cpuset_zone_allowed_*
routines.  It should save a few instructions executed for the calls that
turned into calls of cpuset_zone_allowed_hardwall, thanks to not having to
set (before the call) then check (within the call) the __GFP_HARDWALL flag.

For the most critical call, from get_page_from_freelist(), the same
instructions are executed as before -- the old cpuset_zone_allowed()
routine it used to call is the same code as the
cpuset_zone_allowed_softwall() routine that it calls now.

Not a perfect win, but seems worth it, to reduce this chance of hitting a
sleeping with irq off complaint again.
Signed-off-by: NPaul Jackson <pj@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

02a0e53d

[PATCH] More slab.h cleanups · 55935a34

由 Christoph Lameter 提交于 12月 13, 2006

More cleanups for slab.h

1. Remove tabs from weird locations as suggested by Pekka

2. Drop the check for NUMA and SLAB_DEBUG from the fallback section
   as suggested by Pekka.

3. Uses static inline for the fallback defs as also suggested by Pekka.

4. Make kmem_ptr_valid take a const * argument.

5. Separate the NUMA fallback definitions from the kmalloc_track fallback
   definitions.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

55935a34

[PATCH] Cleanup slab headers / API to allow easy addition of new slab allocators · 2e892f43

由 Christoph Lameter 提交于 12月 13, 2006

This is a response to an earlier discussion on linux-mm about splitting
slab.h components per allocator.  Patch is against 2.6.19-git11.  See
http://marc.theaimsgroup.com/?l=linux-mm&m=116469577431008&w=2

This patch cleans up the slab header definitions.  We define the common
functions of slob and slab in slab.h and put the extra definitions needed
for slab's kmalloc implementations in <linux/slab_def.h>.  In order to get
a greater set of common functions we add several empty functions to slob.c
and also rename slob's kmalloc to __kmalloc.

Slob does not need any special definitions since we introduce a fallback
case.  If there is no need for a slab implementation to provide its own
kmalloc mess^H^H^Hacros then we simply fall back to __kmalloc functions.
That is sufficient for SLOB.

Sort the function in slab.h according to their functionality.  First the
functions operating on struct kmem_cache * then the kmalloc related
functions followed by special debug and fallback definitions.

Also redo a lot of comments.

Signed-off-by: Christoph Lameter <clameter@sgi.com>?
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

2e892f43

[PATCH] slab: fix sleeping in atomic bug · dd47ea75

由 Christoph Lameter 提交于 12月 13, 2006

Fallback_alloc() does not do the check for GFP_WAIT as done in
cache_grow().  Thus interrupts are disabled when we call kmem_getpages()
which results in the failure.

Duplicate the handling of GFP_WAIT in cache_grow().
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Cc: Jay Cliburn <jacliburn@bellsouth.net>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

dd47ea75

11 12月, 2006 1 次提交

[PATCH] user of the jiffies rounding patch: Slab · 2b284214

由 Arjan van de Ven 提交于 12月 10, 2006

This patch introduces users of the round_jiffies() function in the slab code.

The slab code has a few "run every second" timers for background work; these
are obviously not timing critical as long as they happen roughly at the right
frequency.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

2b284214

OpenHarmony / kernel_linux 上一次同步 大约 4 年

OpenHarmony / kernel_linux
上一次同步大约 4 年