提交 · f9c98d0287de42221c624482fd4f8d485c98ab22 · openeuler / raspberrypi-kernel

30 10月, 2005 8 次提交

[PATCH] mm: batch updating mm_counters · ae859762

由 Hugh Dickins 提交于 10月 29, 2005

tlb_finish_mmu used to batch zap_pte_range's update of mm rss, which may be
worthwhile if the mm is contended, and would reduce atomic operations if the
counts were atomic.  Let zap_pte_range now batch its updates to file_rss and
anon_rss, per page-table in case we drop the lock outside; and copy_pte_range
batch them too.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

ae859762

[PATCH] mm: rss = file_rss + anon_rss · 4294621f

由 Hugh Dickins 提交于 10月 29, 2005

I was lazy when we added anon_rss, and chose to change as few places as
possible.  So currently each anonymous page has to be counted twice, in rss
and in anon_rss.  Which won't be so good if those are atomic counts in some
configurations.

Change that around: keep file_rss and anon_rss separately, and add them
together (with get_mm_rss macro) when the total is needed - reading two
atomics is much cheaper than updating two atomics.  And update anon_rss
upfront, typically in memory.c, not tucked away in page_add_anon_rmap.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

4294621f

[PATCH] mm: tlb_finish_mmu forget rss · fc2acab3

由 Hugh Dickins 提交于 10月 29, 2005

zap_pte_range has been counting the pages it frees in tlb->freed, then
tlb_finish_mmu has used that to update the mm's rss.  That got stranger when I
added anon_rss, yet updated it by a different route; and stranger when rss and
anon_rss became mm_counters with special access macros.  And it would no
longer be viable if we're relying on page_table_lock to stabilize the
mm_counter, but calling tlb_finish_mmu outside that lock.

Remove the mmu_gather's freed field, let tlb_finish_mmu stick to its own
business, just decrement the rss mm_counter in zap_pte_range (yes, there was
some point to batching the update, and a subsequent patch restores that).  And
forget the anal paranoia of first reading the counter to avoid going negative
- if rss does go negative, just fix that bug.

Remove the mmu_gather's flushes and avoided_flushes from arm and arm26: no use
was being made of them.  But arm26 alone was actually using the freed, in the
way some others use need_flush: give it a need_flush.  arm26 seems to prefer
spaces to tabs here: respect that.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

fc2acab3

[PATCH] mm: tlb_is_full_mm was obscure · 4d6ddfa9

由 Hugh Dickins 提交于 10月 29, 2005

tlb_is_full_mm? What does that mean? The TLB is full? No, it means that the
mm's last user has gone and the whole mm is being torn down. And it's an
inline function because sparc64 uses a different (slightly better)
"tlb_frozen" name for the flag others call "fullmm".

And now the ptep_get_and_clear_full macro used in zap_pte_range refers
directly to tlb->fullmm, which would be wrong for sparc64. Rather than
correct that, I'd prefer to scrap tlb_is_full_mm altogether, and change
sparc64 to just use the same poor name as everyone else - is that okay?
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

4d6ddfa9

[PATCH] mm: page fault handlers tidyup · 65500d23

由 Hugh Dickins 提交于 10月 29, 2005

Impose a little more consistency on the page fault handlers do_wp_page,
do_swap_page, do_anonymous_page, do_no_page, do_file_page: why not pass their
arguments in the same order, called the same names?

break_cow is all very well, but what it did was inlined elsewhere: easier to
compare if it's brought back into do_wp_page.

do_file_page's fallback to do_no_page dates from a time when we were testing
pte_file by using it wherever possible: currently it's peculiar to nonlinear
vmas, so just check that. BUG_ON if not? Better not, it's probably page
table corruption, so just show the pte: hmm, there's a pte_ERROR macro, let's
use that for do_wp_page's invalid pfn too.

Hah! Someone in the ppc64 world noticed pte_ERROR was unused so removed it:
restored (and say "pud" not "pmd" in its pud_ERROR).
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

65500d23

[PATCH] mm: anon is already wrprotected · 72866f6f

由 Hugh Dickins 提交于 10月 29, 2005

do_anonymous_page's pte_wrprotect causes some confusion: in such a case,
vm_page_prot must already be forcing COW, so must omit write permission, and
so the pte_wrprotect is redundant. Replace it by a comment to that effect,
and reword the comment on unuse_pte which also caused confusion.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

72866f6f

[PATCH] mm: zap_pte_range dont dirty anon · 6237bcd9

由 Hugh Dickins 提交于 10月 29, 2005

zap_pte_range already avoids wasting time to mark_page_accessed on anon pages:
it can also skip anon set_page_dirty - the page only needs to be marked dirty
if shared with another mm, but that will say pte_dirty too.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

6237bcd9

[PATCH] mm: copy_pte_range progress fix · e040f218

由 Hugh Dickins 提交于 10月 29, 2005

My latency breaking in copy_pte_range didn't work as intended: instead of
checking at regularish intervals, after the first interval it checked every
time around the loop, too impatient to be preempted.  Fix that.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

e040f218

21 10月, 2005 1 次提交

[PATCH] Fix handling spurious page fault for hugetlb region · ac9b9c66

由 Hugh Dickins 提交于 10月 20, 2005

This reverts commit 3359b54c and
replaces it with a cleaner version that is purely based on page table
operations, so that the synchronization between inode size and hugetlb
mappings becomes moot.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

ac9b9c66

20 10月, 2005 1 次提交

[PATCH] Handle spurious page fault for hugetlb region · 3359b54c

由 Seth, Rohit 提交于 10月 18, 2005

The hugetlb pages are currently pre-faulted.  At the time of mmap of
hugepages, we populate the new PTEs.  It is possible that HW has already
cached some of the unused PTEs internally.  These stale entries never
get a chance to be purged in existing control flow.

This patch extends the check in page fault code for hugepages.  Check if
a faulted address falls with in size for the hugetlb file backing it.
We return VM_FAULT_MINOR for these cases (assuming that the arch
specific page-faulting code purges the stale entry for the archs that
need it).
Signed-off-by: NRohit Seth <rohit.seth@intel.com>

[ This is apparently arguably an ia64 port bug. But the code won't
  hurt, and for now it fixes a real problem on some ia64 machines ]
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

3359b54c

11 9月, 2005 1 次提交

[PATCH] mm/filemap.c: make two functions static · 5ce7852c

由 Adrian Bunk 提交于 9月 10, 2005

With Nick Piggin <npiggin@suse.de>

Give some things static scope.
Signed-off-by: NAdrian Bunk <bunk@stusta.de>
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

5ce7852c

05 9月, 2005 2 次提交

[PATCH] x86: ptep_clear optimization · a600388d

由 Zachary Amsden 提交于 9月 03, 2005

Add a new accessor for PTEs, which passes the full hint from the mmu_gather
struct; this allows architectures with hardware pagetables to optimize away
atomic PTE operations when destroying an address space. Removing the
locked operation should allow better pipelining of memory access in this
loop. I measured an average savings of 30-35 cycles per zap_pte_range on
the first 500 destructions on Pentium-M, but I believe the optimization
would win more on older processors which still assert the bus lock on xchg
for an exclusive cacheline.

Update: I made some new measurements, and this saves exactly 26 cycles over
ptep_get_and_clear on Pentium M. On P4, with a PAE kernel, this saves 180
cycles per ptep_get_and_clear, for a whopping 92160 cycles savings for a
full address space destruction.

pte_clear_full is not yet used, but is provided for future optimizations
(in particular, when running inside of a hypervisor that queues page table
updates, the full hint allows us to avoid queueing unnecessary page table
update for an address space in the process of being destroyed.

This is not a huge win, but it does help a bit, and sets the stage for
further hypervisor optimization of the mm layer on all architectures.
Signed-off-by: NZachary Amsden <zach@vmware.com>
Cc: Christoph Lameter <christoph@lameter.com>
Cc: <linux-mm@kvack.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a600388d

[PATCH] mm: remove implied vm_ops check · 4944e76d

由 Paolo 'Blaisorblade' Giarrusso 提交于 9月 03, 2005

If !vma->vm-ops we already BUG above, so retesting it is useless.  The
compiler cannot optimize this because BUG is a macro and is not thus marked
noreturn; that should possibly be fixed.
Signed-off-by: NPaolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

4944e76d

30 8月, 2005 1 次提交

[PATCH] Lazy page table copies in fork() · d992895b

由 Nick Piggin 提交于 8月 28, 2005

Defer copying of ptes until fault time when it is possible to reconstruct
the pte from backing store. Idea from Andi Kleen and Nick Piggin.

Thanks to input from Rik van Riel and Linus and to Hugh for correcting
my blundering.

Ray Fucillo <fucillo@intersystems.com> reports:

  "I applied this latest patch to a 2.6.12 kernel and found that it does
   resolve the problem.  Prior to the patch on this machine, I was
   seeing about 23ms spent in fork for ever 100MB of shared memory
   segment.

   After applying the patch, fork is taking about 1ms regardless of the
   shared memory size."
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

d992895b

04 8月, 2005 2 次提交

Fix up recent get_user_pages() handling · a68d2ebc

由 Linus Torvalds 提交于 8月 03, 2005

The VM_FAULT_WRITE thing is an extra bit, not a valid return value, and
has to be treated as such by get_user_pages().
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a68d2ebc

[PATCH] fix get_user_pages bug · f33ea7f4

由 Nick Piggin 提交于 8月 03, 2005

Checking pte_dirty instead of pte_write in __follow_page is problematic
for s390, and for copy_one_pte which leaves dirty when clearing write.

So revert __follow_page to check pte_write as before, and make
do_wp_page pass back a special extra VM_FAULT_WRITE bit to say it has
done its full job: once get_user_pages receives this value, it no longer
requires pte_write in __follow_page.

But most callers of handle_mm_fault, in the various architectures, have
switch statements which do not expect this new case.  To avoid changing
them all in a hurry, make an inline wrapper function (using the old
name) that masks off the new bit, and use the extended interface with
double underscores.

Yes, we do have a call to do_wp_page from do_swap_page, but no need to
change that: in rare case it's needed, another do_wp_page will follow.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
[ Cleanups by Nick Piggin ]
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

f33ea7f4

02 8月, 2005 2 次提交

[PATCH] x86_64: access of some bad address · 690dbe1c

由 Hugh Dickins 提交于 8月 01, 2005

x86_64 has a large sparse gate area between VSYSCALL_START and
VSYSCALL_END, not all of it presently backed by pmds.  Alexander Nyberg has
found that in some circumstances gdb may try to ptrace here, and hit
get_user_pages BUG_ON.  It seems odd that gdb should be accessing here, but
it certainly shouldn't crash in this way: relax BUG_ON to -EFAULT.  Fixes
kernel bugzilla #4801.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

690dbe1c

Fix get_user_pages() race for write access · 4ceb5db9

由 Linus Torvalds 提交于 8月 01, 2005

There's no real guarantee that handle_mm_fault() will always be able to
break a COW situation - if an update from another thread ends up
modifying the page table some way, handle_mm_fault() may end up
requiring us to re-try the operation.

That's normally fine, but get_user_pages() ended up re-trying it as a
read, and thus a write access could in theory end up losing the dirty
bit or be done on a page that had not been properly COW'ed.

This makes get_user_pages() always retry write accesses as write
accesses by making "follow_page()" require that a writable follow has
the dirty bit set. That simplifies the code and solves the race: if the
COW break fails for some reason, we'll just loop around and try again.
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

4ceb5db9

28 7月, 2005 1 次提交

[PATCH] check_user_page_readable() deadlock fix · 1aaf18ff

由 Andrew Morton 提交于 7月 27, 2005

Fix bug identifued by Richard Purdie <rpurdie@rpsys.net>.

oprofile calls check_user_page_readable() from interrupt context, so we
deadlock over various VFS locks.

But check_user_page_readable() doesn't imply either a read or a write of the
page's contents.  Change __follow_page() so that check_user_page_readable()
can tell __follow_page() that we're not accessing the page's contents, and use
that info to avoid the troublesome lock-takings.

Also, make follow_page() inline for the single callsite in memory.c to save a
bit of stack space.
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

1aaf18ff

26 6月, 2005 1 次提交

[PATCH] mm: fix remap_pte_range BUG · 2d15cab8

由 Hugh Dickins 提交于 6月 25, 2005

Out-of-tree user of remap_pfn_range hit kernel BUG at mm/memory.c:1112!  It
passes an unrounded size to remap_pfn_range, which was okay before 2.6.12,
but misses remap_pte_range's new end condition.  An audit of all the other
ptwalks confirms that this is the only one so exposed.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

2d15cab8

24 6月, 2005 2 次提交

[PATCH] DocBook: update comments · 3d41088f

由 Martin Waitz 提交于 6月 23, 2005

This patch updates some comments to match code changes.
Signed-off-by: NMartin Waitz <tali@admingilde.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

3d41088f

[PATCH] sparsemem memory model · d41dee36

由 Andy Whitcroft 提交于 6月 23, 2005

Sparsemem abstracts the use of discontiguous mem_maps[].  This kind of
mem_map[] is needed by discontiguous memory machines (like in the old
CONFIG_DISCONTIGMEM case) as well as memory hotplug systems.  Sparsemem
replaces DISCONTIGMEM when enabled, and it is hoped that it can eventually
become a complete replacement.

A significant advantage over DISCONTIGMEM is that it's completely separated
from CONFIG_NUMA.  When producing this patch, it became apparent in that NUMA
and DISCONTIG are often confused.

Another advantage is that sparse doesn't require each NUMA node's ranges to be
contiguous.  It can handle overlapping ranges between nodes with no problems,
where DISCONTIGMEM currently throws away that memory.

Sparsemem uses an array to provide different pfn_to_page() translations for
each SECTION_SIZE area of physical memory.  This is what allows the mem_map[]
to be chopped up.

In order to do quick pfn_to_page() operations, the section number of the page
is encoded in page->flags.  Part of the sparsemem infrastructure enables
sharing of these bits more dynamically (at compile-time) between the
page_zone() and sparsemem operations.  However, on 32-bit architectures, the
number of bits is quite limited, and may require growing the size of the
page->flags type in certain conditions.  Several things might force this to
occur: a decrease in the SECTION_SIZE (if you want to hotplug smaller areas of
memory), an increase in the physical address space, or an increase in the
number of used page->flags.

One thing to note is that, once sparsemem is present, the NUMA node
information no longer needs to be stored in the page->flags.  It might provide
speed increases on certain platforms and will be stored there if there is
room.  But, if out of room, an alternate (theoretically slower) mechanism is
used.

This patch introduces CONFIG_FLATMEM.  It is used in almost all cases where
there used to be an #ifndef DISCONTIG, because SPARSEMEM and DISCONTIGMEM
often have to compile out the same areas of code.
Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
Signed-off-by: NMartin Bligh <mbligh@aracnet.com>
Signed-off-by: NAdrian Bunk <bunk@stusta.de>
Signed-off-by: NYasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: NBob Picco <bob.picco@hp.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

d41dee36

22 6月, 2005 3 次提交

[PATCH] can_share_swap_page: use page_mapcount · c475a8ab

由 Hugh Dickins 提交于 6月 21, 2005

Remember that ironic get_user_pages race? when the raised page_count on a
page swapped out led do_wp_page to decide that it had to copy on write, so
substituted a different page into userspace. 2.6.7 onwards have Andrea's
solution, where try_to_unmap_one backs out if it finds page_count raised.

Which works, but is unsatisfying (rmap.c has no other page_count heuristics),
and was found a few months ago to hang an intensive page migration test. A
year ago I was hesitant to engage page_mapcount, now it seems the right fix.

So remove the page_count hack from try_to_unmap_one; and use activate_page in
unuse_mm when dropping lock, to replace its secondary effect of helping
swapoff to make progress in that case.

Simplify can_share_swap_page (now called only on anonymous pages) to check
page_mapcount + page_swapcount == 1: still needs the page lock to stabilize
their (pessimistic) sum, but does not need swapper_space.tree_lock for that.

In do_swap_page, move swap_free and unlock_page below page_add_anon_rmap, to
keep sum on the high side, and correct when can_share_swap_page called.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

c475a8ab

[PATCH] do_wp_page: cannot share file page · d296e9cd

由 Hugh Dickins 提交于 6月 21, 2005

A small optimization to do_wp_page's check for whether to avoid copy by
reusing the page already mapped. It can never share a cached file page,
nor can it share a reserved page (often the empty zero page), so it's a
waste of time to lock and unlock in those cases. Which nowadays can both
be neatly excluded by a preliminary PageAnon test.

Christoph has reported that a preliminary page_count test proved valuable
for scalability here, but PageAnon covers more common cases all at once.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

d296e9cd

[PATCH] get_user_pages: kill get_page_map · 08ef4729

由 Hugh Dickins 提交于 6月 21, 2005

Since its birth, get_user_pages has been calling a misguided get_page_map
function.  follow_page has already returned NULL if the pfn is invalid, we
cannot reach an invalid pfn from a validated struct page.

Remove get_page_map, and the messy rewind in get_user_pages to cope with
its failure.  Oh, and could we please call that "struct page *page" like
everywhere else, instead of "struct page *map"?
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

08ef4729

17 5月, 2005 1 次提交

[PATCH] do_swap_page() can map random data if swap read fails · b8107480

由 Kirill Korotaev 提交于 5月 16, 2005

There is a bug in do_swap_page(): when swap page happens to be unreadable,
page filled with random data is mapped into user address space.  The fix is
to check for PageUptodate and send SIGBUS in case of error.
Signed-Off-By: NKirill Korotaev <dev@sw.ru>
Signed-Off-By: NAlexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Acked-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

b8107480

20 4月, 2005 3 次提交

[PATCH] freepgt: hugetlb_free_pgd_range · 3bf5ee95

由 Hugh Dickins 提交于 4月 19, 2005

ia64 and ppc64 had hugetlb_free_pgtables functions which were no longer being
called, and it wasn't obvious what to do about them.

The ppc64 case turns out to be easy: the associated tables are noted elsewhere
and freed later, safe to either skip its hugetlb areas or go through the
motions of freeing nothing. Since ia64 does need a special case, restore to
ppc64 the special case of skipping them.

The ia64 hugetlb case has been broken since pgd_addr_end went in, though it
probably appeared to work okay if you just had one such area; in fact it's
been broken much longer if you consider a long munmap spanning from another
region into the hugetlb region.

In the ia64 hugetlb region, more virtual address bits are available than in
the other regions, yet the page tables are structured the same way: the page
at the bottom is larger. Here we need to scale down each addr before passing
it to the standard free_pgd_range. Was about to write a hugely_scaled_down
macro, but found htlbpage_to_page already exists for just this purpose. Fixed
off-by-one in ia64 is_hugepage_only_range.

Uninline free_pgd_range to make it available to ia64. Make sure the
vma-gathering loop in free_pgtables cannot join a hugepage_only_range to any
other (safe to join huges? probably but don't bother).
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

3bf5ee95

[PATCH] freepgt: remove MM_VM_SIZE(mm) · ee39b37b

由 Hugh Dickins 提交于 4月 19, 2005

There's only one usage of MM_VM_SIZE(mm) left, and it's a troublesome macro
because mm doesn't contain the (32-bit emulation?) info needed. But it too is
only needed because we ignore the end from the vma list.

We could make flush_pgtables return that end, or unmap_vmas. Choose the
latter, since it's a natural fit with unmap_mapping_range_vma needing to know
its restart addr. This does make more than minimal change, but if unmap_vmas
had returned the end before, this is how we'd have done it, rather than
storing the break_addr in zap_details.

unmap_vmas used to return count of vmas scanned, but that's just debug which
hasn't been useful in a while; and if we want the map_count 0 on exit check
back, it can easily come from the final remove_vm_struct loop.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

ee39b37b

[PATCH] freepgt: free_pgtables use vma list · e0da382c

由 Hugh Dickins 提交于 4月 19, 2005

Recent woes with some arches needing their own pgd_addr_end macro; and 4-level
clear_page_range regression since 2.6.10's clear_page_tables; and its
long-standing well-known inefficiency in searching throughout the higher-level
page tables for those few entries to clear and free: all can be blamed on
ignoring the list of vmas when we free page tables.

Replace exit_mmap's clear_page_range of the total user address space by
free_pgtables operating on the mm's vma list; unmap_region use it in the same
way, giving floor and ceiling beyond which it may not free tables. This
brings lmbench fork/exec/sh numbers back to 2.6.10 (unless preempt is enabled,
in which case latency fixes spoil unmap_vmas throughput).

Beware: the do_mmap_pgoff driver failure case must now use unmap_region
instead of zap_page_range, since a page table might have been allocated, and
can only be freed while it is touched by some vma.

Move free_pgtables from mmap.c to memory.c, where its lower levels are adapted
from the clear_page_range levels. (Most of free_pgtables' old code was
actually for a non-existent case, prev not properly set up, dating from before
hch gave us split_vma.) Pass mmu_gather** in the public interfaces, since we
might want to add latency lockdrops later; but no attempt to do so yet, going
by vma should itself reduce latency.

But what if is_hugepage_only_range? Those ia64 and ppc64 cases need careful
examination: put that off until a later patch of the series.

What of x86_64's 32bit vdso page __map_syscall32 maps outside any vma?

And the range to sparc64's flush_tlb_pgtables? It's less clear to me now that
we need to do more than is done here - every PMD_SIZE ever occupied will be
flushed, do we really have to flush every PGDIR_SIZE ever partially occupied?
A shame to complicate it unnecessarily.

Special thanks to David Miller for time spent repairing my ceilings.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

e0da382c

17 4月, 2005 1 次提交

Linux-2.6.12-rc2 · 1da177e4

由 Linus Torvalds 提交于 4月 16, 2005

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!

1da177e4