提交 · 1eeb66a1bb973534dc3d064920a5ca683823372e · openeuler / Kernel

09 5月, 2007 2 次提交

move die notifier handling to common code · 1eeb66a1

由 Christoph Hellwig 提交于 5月 08, 2007

This patch moves the die notifier handling to common code.  Previous
various architectures had exactly the same code for it.  Note that the new
code is compiled unconditionally, this should be understood as an appel to
the other architecture maintainer to implement support for it aswell (aka
sprinkling a notify_die or two in the proper place)

arm had a notifiy_die that did something totally different, I renamed it to
arm_notify_die as part of the patch and made it static to the file it's
declared and used at.  avr32 used to pass slightly less information through
this interface and I brought it into line with the other architectures.

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix vmalloc_sync_all bustage]
[bryan.wu@analog.com: fix vmalloc_sync_all in nommu]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Cc: <linux-arch@vger.kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: NBryan Wu <bryan.wu@analog.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1eeb66a1

use SLAB_PANIC flag cleanup · 0e6b9c98

由 Akinobu Mita 提交于 5月 08, 2007

Use SLAB_PANIC and delete duplicated panic().
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Cc: Ian Molton <spyro@f2s.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0e6b9c98

08 5月, 2007 1 次提交

get_unmapped_area handles MAP_FIXED on i386 · 5a8130f2

由 Benjamin Herrenschmidt 提交于 5月 06, 2007

Handle MAP_FIXED in i386 hugetlb_get_unmapped_area(), just call
prepare_hugepage_range.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: NWilliam Irwin <bill.irwin@oracle.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Adam Litke <agl@us.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5a8130f2

07 5月, 2007 1 次提交

Revert "[PATCH] x86: __pa and __pa_symbol address space separation" · e3ebadd9

由 Linus Torvalds 提交于 5月 07, 2007

This was broken.  It adds complexity, for no good reason.  Rather than
separate __pa() and __pa_symbol(), we should deprecate __pa_symbol(),
and preferably __pa() too - and just use "virt_to_phys()" instead, which
is more readable and has nicer semantics.

However, right now, just undo the separation, and make __pa_symbol() be
the exact same as __pa().  That fixes the bugs this patch introduced,
and we can do the fairly obvious cleanups later.

Do the new __phys_addr() function (which is now the actual workhorse for
the unified __pa()/__pa_symbol()) as a real external function, that way
all the potential issues with compile/link-time optimizations of
constant symbol addresses go away, and we can also, if we choose to, add
more sanity-checking of the argument.

Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e3ebadd9

03 5月, 2007 9 次提交

[PATCH] i386: PARAVIRT: flush lazy mmu updates on kunmap_atomic · 7b2f27f4

由 Jeremy Fitzhardinge 提交于 5月 02, 2007

kunmap_atomic should flush any pending lazy mmu updates, mainly to be
consistent with kmap_atomic, and to preserve its normal behaviour.
Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: NAndi Kleen <ak@suse.de>

7b2f27f4

[PATCH] i386: PARAVIRT: add kmap_atomic_pte for mapping highpte pages · ce6234b5

由 Jeremy Fitzhardinge 提交于 5月 02, 2007

Xen and VMI both have special requirements when mapping a highmem pte
page into the kernel address space.  These can be dealt with by adding
a new kmap_atomic_pte() function for mapping highptes, and hooking it
into the paravirt_ops infrastructure.

Xen specifically wants to map the pte page RO, so this patch exposes a
helper function, kmap_atomic_prot, which maps the page with the
specified page protections.

This also adds a kmap_flush_unused() function to clear out the cached
kmap mappings.  Xen needs this to clear out any potential stray RW
mappings of pages which will become part of a pagetable.

[ Zach - vmi.c will need some attention after this patch.  It wasn't
  immediately obvious to me what needs to be done. ]
Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: NAndi Kleen <ak@suse.de>
Cc: Zachary Amsden <zach@vmware.com>

ce6234b5

[PATCH] i386: PARAVIRT: Allow paravirt backend to choose kernel PMD sharing · 5311ab62

由 Jeremy Fitzhardinge 提交于 5月 02, 2007

Normally when running in PAE mode, the 4th PMD maps the kernel address space,
which can be shared among all processes (since they all need the same kernel
mappings).

Xen, however, does not allow guests to have the kernel pmd shared between page
tables, so parameterize pgtable.c to allow both modes of operation.

There are several side-effects of this.  One is that vmalloc will update the
kernel address space mappings, and those updates need to be propagated into
all processes if the kernel mappings are not intrinsically shared.  In the
non-PAE case, this is done by maintaining a pgd_list of all processes; this
list is used when all process pagetables must be updated.  pgd_list is
threaded via otherwise unused entries in the page structure for the pgd, which
means that the pgd must be page-sized for this to work.

Normally the PAE pgd is only 4x64 byte entries large, but Xen requires the PAE
pgd to page aligned anyway, so this patch forces the pgd to be page
aligned+sized when the kernel pmd is unshared, to accomodate both these
requirements.

Also, since there may be several distinct kernel pmds (if the user/kernel
split is below 3G), there's no point in allocating them from a slab cache;
they're just allocated with get_free_page and initialized appropriately.  (Of
course the could be cached if there is just a single kernel pmd - which is the
default with a 3G user/kernel split - but it doesn't seem worthwhile to add
yet another case into this code).

[ Many thanks to wli for review comments. ]
Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: NWilliam Lee Irwin III <wli@holomorphy.com>
Signed-off-by: NAndi Kleen <ak@suse.de>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Christoph Lameter <clameter@sgi.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

5311ab62

[PATCH] i386: PARAVIRT: Hooks to set up initial pagetable · b239fb25

由 Jeremy Fitzhardinge 提交于 5月 02, 2007

This patch introduces paravirt_ops hooks to control how the kernel's
initial pagetable is set up.

In the case of a native boot, the very early bootstrap code creates a
simple non-PAE pagetable to map the kernel and physical memory.  When
the VM subsystem is initialized, it creates a proper pagetable which
respects the PAE mode, large pages, etc.

When booting under a hypervisor, there are many possibilities for what
paging environment the hypervisor establishes for the guest kernel, so
the constructon of the kernel's pagetable depends on the hypervisor.

In the case of Xen, the hypervisor boots the kernel with a fully
constructed pagetable, which is already using PAE if necessary.  Also,
Xen requires particular care when constructing pagetables to make sure
all pagetables are always mapped read-only.

In order to make this easier, kernel's initial pagetable construction
has been changed to only allocate and initialize a pagetable page if
there's no page already present in the pagetable.  This allows the Xen
paravirt backend to make a copy of the hypervisor-provided pagetable,
allowing the kernel to establish any more mappings it needs while
keeping the existing ones.

A slightly subtle point which is worth highlighting here is that Xen
requires all kernel mappings to share the same pte_t pages between all
pagetables, so that updating a kernel page's mapping in one pagetable
is reflected in all other pagetables.  This makes it possible to
allocate a page and attach it to a pagetable without having to
explicitly enumerate that page's mapping in all pagetables.

And:

+From: "Eric W. Biederman" <ebiederm@xmission.com>

If we don't set the leaf page table entries it is quite possible that
will inherit and incorrect page table entry from the initial boot
page table setup in head.S.  So we need to redo the effort here,
so we pick up PSE, PGE and the like.

Hypervisors like Xen require that their page tables be read-only,
which is slightly incompatible with our low identity mappings, however
I discussed this with Jeremy he has modified the Xen early set_pte
function to avoid problems in this area.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: NAndi Kleen <ak@suse.de>
Acked-by: NWilliam Irwin <bill.irwin@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>

b239fb25

[PATCH] i386: Relocate VDSO ELF headers to match mapped location with COMPAT_VDSO · d4f7a2c1

由 Jeremy Fitzhardinge 提交于 5月 02, 2007

Some versions of libc can't deal with a VDSO which doesn't have its
ELF headers matching its mapped address.  COMPAT_VDSO maps the VDSO at
a specific system-wide fixed address.  Previously this was all done at
build time, on the grounds that the fixed VDSO address is always at
the top of the address space.  However, a hypervisor may reserve some
of that address space, pushing the fixmap address down.

This patch does the adjustment dynamically at runtime, depending on
the runtime location of the VDSO fixmap.

[ Patch has been through several hands: Jan Beulich wrote the orignal
  version; Zach reworked it, and Jeremy converted it to relocate phdrs
  as well as sections. ]
Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: NAndi Kleen <ak@suse.de>
Cc: Zachary Amsden <zach@vmware.com>
Cc: "Jan Beulich" <JBeulich@novell.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland McGrath <roland@redhat.com>

d4f7a2c1

[PATCH] x86: tighten kernel image page access rights · 6fb14755

由 Jan Beulich 提交于 5月 02, 2007

On x86-64, kernel memory freed after init can be entirely unmapped instead
of just getting 'poisoned' by overwriting with a debug pattern.

On i386 and x86-64 (under CONFIG_DEBUG_RODATA), kernel text and bug table
can also be write-protected.

Compared to the first version, this one prevents re-creating deleted
mappings in the kernel image range on x86-64, if those got removed
previously. This, together with the original changes, prevents temporarily
having inconsistent mappings when cacheability attributes are being
changed on such pages (e.g. from AGP code). While on i386 such duplicate
mappings don't exist, the same change is done there, too, both for
consistency and because checking pte_present() before using various other
pte_XXX functions is a requirement anyway. At once, i386 code gets
adjusted to use pte_huge() instead of open coding this.

AK: split out cpa() changes
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Signed-off-by: NAndi Kleen <ak@suse.de>

6fb14755

[PATCH] x86: Improve handling of kernel mappings in change_page_attr · d01ad8dd

由 Jan Beulich 提交于 5月 02, 2007

Fix various broken corner cases in i386 and x86-64 change_page_attr.

AK: split off from tighten kernel image access rights
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Signed-off-by: NAndi Kleen <ak@suse.de>

d01ad8dd

[PATCH] x86: __pa and __pa_symbol address space separation · 0dbf7028

由 Vivek Goyal 提交于 5月 02, 2007

Currently __pa_symbol is for use with symbols in the kernel address
map and __pa is for use with pointers into the physical memory map.
But the code is implemented so you can usually interchange the two.

__pa which is much more common can be implemented much more cheaply
if it is it doesn't have to worry about any other kernel address
spaces.  This is especially true with a relocatable kernel as
__pa_symbol needs to peform an extra variable read to resolve
the address.

There is a third macro that is added for the vsyscall data
__pa_vsymbol for finding the physical addesses of vsyscall pages.

Most of this patch is simply sorting through the references to
__pa or __pa_symbol and using the proper one.  A little of
it is continuing to use a physical address when we have it
instead of recalculating it several times.

swapper_pgd is now NULL.  leave_mm now uses init_mm.pgd
and init_mm.pgd is initialized at boot (instead of compile time)
to the physmem virtual mapping of init_level4_pgd.  The
physical address changed.

Except for the for EMPTY_ZERO page all of the remaining references
to __pa_symbol appear to be during kernel initialization.  So this
should reduce the cost of __pa in the common case, even on a relocated
kernel.

As this is technically a semantic change we need to be on the lookout
for anything I missed.  But it works for me (tm).
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NVivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: NAndi Kleen <ak@suse.de>

0dbf7028

[PATCH] i386: adjustments to page table dump during oops (v4) · 28609f6e

由 Jan Beulich 提交于 5月 02, 2007

- make the page table contents printing PAE capable
- make sure the address stored in current->thread.cr2 is unmodified
  from what was read from CR2
- don't call oops_may_print() multiple times, when one time suffices
- print pte even in highpte case, as long as the pte page isn't in
  actually in high memory (which is specifically the case for all page
  tables covering kernel space)

(Changes to v3: Use sizeof()*2 rather than the suggested sizeof()*4 for
printing width, use fixed 16-nibble width for PAE, and also apply the
max_low_pfn range check to the middle level lookup on PAE.)
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Signed-off-by: NAndi Kleen <ak@suse.de>

28609f6e

09 4月, 2007 1 次提交

[PATCH] Proper fix for highmem kmap_atomic functions for VMI for 2.6.21 · 49f19710

由 Zachary Amsden 提交于 4月 08, 2007

Since lazy MMU batching mode still allows interrupts to enter, it is
possible for interrupt handlers to try to use kmap_atomic, which fails when
lazy mode is active, since the PTE update to highmem will be delayed.  The
best workaround is to issue an explicit flush in kmap_atomic_functions
case; this is the only way nested PTE updates can happen in the interrupt
handler.

Thanks to Jeremy Fitzhardinge for noting the bug and suggestions on a fix.

This patch gets reverted again when we start 2.6.22 and the bug gets fixed
differently.
Signed-off-by: NZachary Amsden <zach@vmware.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

49f19710

13 2月, 2007 4 次提交

[PATCH] i386: Remove extern declaration from mm/discontig.c, put in header. · 40d22c1b

由 Rusty Russell 提交于 2月 13, 2007

Extern declarations belong in headers.  Times, they are a'changin.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NAndi Kleen <ak@suse.de>

===================================================================

40d22c1b

[PATCH] x86: simplify notify_page_fault() · 9b355897

由 Jan Beulich 提交于 2月 13, 2007

Remove all parameters from this function that aren't really variable.
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Signed-off-by: NAndi Kleen <ak@suse.de>

9b355897

[PATCH] i386: vMI backend for paravirt-ops · 7ce0bcfd

由 Zachary Amsden 提交于 2月 13, 2007

Fairly straightforward implementation of VMI backend for paravirt-ops.

[Adrian Bunk: some cleanups]
Signed-off-by: NZachary Amsden <zach@vmware.com>
Signed-off-by: NAndi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>

7ce0bcfd

[PATCH] MM: page allocation hooks for VMI backend · c119ecce

由 Zachary Amsden 提交于 2月 13, 2007

The VMI backend uses explicit page type notification to track shadow page
tables.  The allocation of page table roots is especially tricky.  We need to
clone the root for non-PAE mode while it is protected under the pgd lock to
correctly copy the shadow.

We don't need to allocate pgds in PAE mode, (PDPs in Intel terminology) as
they only have 4 entries, and are cached entirely by the processor, which
makes shadowing them rather simple.

For base page table level allocation, pmd_populate provides the exact hook
point we need.  Also, we need to allocate pages when splitting a large page,
and we must release pages before returning the page to any free pool.

Despite being required with these slightly odd semantics for VMI, Xen also
uses these hooks to determine the exact moment when page tables are created or
released.

AK: All nops for other architectures
Signed-off-by: NZachary Amsden <zach@vmware.com>
Signed-off-by: NAndi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>

c119ecce

12 2月, 2007 2 次提交

[PATCH] highmem: catch illegal nesting · 656dad31

由 Ingo Molnar 提交于 2月 10, 2007

Catch illegally nested kmap_atomic()s even if the page that is mapped by
the 'inner' instance is from lowmem.

This avoids spuriously zapped kmap-atomic ptes and turns hard to find
crashes into clear asserts at the bug site.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

656dad31

[PATCH] Consolidate bust_spinlocks() · cefc8be8

由 Kirill Korotaev 提交于 2月 10, 2007

Part of long forgotten patch
http://groups.google.com/group/fa.linux.kernel/msg/e98e941ce1cf29f6?dmode=source
Since then, m32r grabbed two copies.

Leave s390 copy because of important absence of CONFIG_VT, but remove
references to non-existent timerlist_lock.  ia64 also loses timerlist_lock.
Signed-off-by: NAlexey Dobriyan <adobriyan@openvz.org>
Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cefc8be8

10 2月, 2007 1 次提交

[PATCH] misc NULL noise removal · 11718b4d

由 Al Viro 提交于 2月 09, 2007

Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

11718b4d

11 1月, 2007 1 次提交

[PATCH] i386: Fix memory hotplug related MODPOST generated warning · 0e0be25d

由 Vivek Goyal 提交于 1月 11, 2007

o Fix modpost generated warning.

WARNING: vmlinux - Section mismatch: reference to .init.text: from .text
between 'add_one_highpage_hotplug' (at offset 0xc0113d3f) and 'online_page'
Signed-off-by: NVivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NAndi Kleen <ak@suse.de>

0e0be25d

23 12月, 2006 1 次提交

[PATCH] memory hotplug: fix compile error for i386 with NUMA config · 7c7e9425

由 Yasunori Goto 提交于 12月 22, 2006

Fix compile error when config memory hotplug with numa on i386.

The cause of compile error was missing of arch_add_memory(),
remove_memory(), and memory_add_physaddr_to_nid().
Signed-off-by: NYasunori Goto <y-goto@jp.fujitsu.com>
Acked-by: NDavid Rientjes <rientjes@cs.washington.edu>
Acked-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

7c7e9425

08 12月, 2006 4 次提交

[PATCH] slab: remove kmem_cache_t · e18b890b

由 Christoph Lameter 提交于 12月 06, 2006

Replace all uses of kmem_cache_t with struct kmem_cache.

The patch was generated using the following script:

	#!/bin/sh
	#
	# Replace one string by another in all the kernel sources.
	#

	set -e

	for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
		quilt add $file
		sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
		mv /tmp/$$ $file
		quilt refresh
	done

The script was run like this

	sh replace kmem_cache_t "struct kmem_cache"
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

e18b890b

[PATCH] Fix kunmap_atomic's use of kpte_clear_flush() · 3b17979b

由 Jeremy Fitzhardinge 提交于 12月 06, 2006

kunmap_atomic() will call kpte_clear_flush with vaddr/ptep arguments which
don't correspond if the vaddr is just a normal lowmem address (ie, not in
the KMAP area).  This patch makes sure that the pte is only cleared if kmap
area was actually used for the mapping.
Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: Zachary Amsden <zach@vmware.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

3b17979b

[PATCH] mm: pagefault_{disable,enable}() · a866374a

由 Peter Zijlstra 提交于 12月 06, 2006

Introduce pagefault_{disable,enable}() and use these where previously we did
manual preempt increments/decrements to make the pagefault handler do the
atomic thing.

Currently they still rely on the increased preempt count, but do not rely on
the disabled preemption, this might go away in the future.

(NOTE: the extra barrier() in pagefault_disable might fix some holes on
       machines which have too many registers for their own good)

[heiko.carstens@de.ibm.com: s390 fix]
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NNick Piggin <npiggin@suse.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a866374a

[PATCH] shared page table for hugetlb page · 39dde65c

由 Chen, Kenneth W 提交于 12月 06, 2006

Following up with the work on shared page table done by Dave McCracken.  This
set of patch target shared page table for hugetlb memory only.

The shared page table is particular useful in the situation of large number of
independent processes sharing large shared memory segments.  In the normal
page case, the amount of memory saved from process' page table is quite
significant.  For hugetlb, the saving on page table memory is not the primary
objective (as hugetlb itself already cuts down page table overhead
significantly), instead, the purpose of using shared page table on hugetlb is
to allow faster TLB refill and smaller cache pollution upon TLB miss.

With PT sharing, pte entries are shared among hundreds of processes, the cache
consumption used by all the page table is smaller and in return, application
gets much higher cache hit ratio.  One other effect is that cache hit ratio
with hardware page walker hitting on pte in cache will be higher and this
helps to reduce tlb miss latency.  These two effects contribute to higher
application performance.
Signed-off-by: NKen Chen <kenneth.w.chen@intel.com>
Acked-by: NHugh Dickins <hugh@veritas.com>
Cc: Dave McCracken <dmccr@us.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

39dde65c

07 12月, 2006 6 次提交

[PATCH] i386: Preserve EFI run time regions with memmap parameter · bf7e6a19

由 Artiom Myaskouvskey 提交于 12月 07, 2006

When using memmap kernel parameter in EFI boot we should also add to memory map
memory regions of runtime services to enable their mapping later.

AK: merged and cleaned up the patch
Signed-off-by: NArtiom Myaskouvskey <artiom.myaskouvskey@intel.com>
Signed-off-by: NAndi Kleen <ak@suse.de>

bf7e6a19

[PATCH] i386: clear_fixmap() should not use set_pte() · b0bfece4

由 Jan Beulich 提交于 12月 07, 2006

While not strictly required with the current code (as the upper half of
page table entries generated by __set_fixmap() cannot be non-zero due
to the second parameter of this function being 'unsigned long'), the
use of set_pte() in __set_fixmap() in the context of clear_fixmap() is
still improper with CONFIG_X86_PAE (see the respective comment in
include/asm-i386/pgtable-3level.h) and would turn into a bug if that
second parameter ever gets changed to a 64-bit type.
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Signed-off-by: NAndi Kleen <ak@suse.de>

b0bfece4

[PATCH] paravirt: Add MMU virtualization to paravirt_ops · da181a8b

由 Rusty Russell 提交于 12月 07, 2006

Add the three bare TLB accessor functions to paravirt-ops.  Most amusingly,
flush_tlb is redefined on SMP, so I can't call the paravirt op flush_tlb.
Instead, I chose to indicate the actual flush type, kernel (global) vs. user
(non-global).  Global in this sense means using the global bit in the page
table entry, which makes TLB entries persistent across CR3 reloads, not
global as in the SMP sense of invoking remote shootdowns, so the term is
confusingly overloaded.

AK: folded in fix from Zach for PAE compilation
Signed-off-by: NZachary Amsden <zach@vmware.com>
Signed-off-by: NChris Wright <chrisw@sous-sol.org>
Signed-off-by: NAndi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>

da181a8b

[PATCH] i386: substitute __va lookup with pfn_to_kaddr · 3529833f

由 David Rientjes 提交于 12月 07, 2006

Substitutes allocate_pgdat virtual address lookup with pfn_to_kaddr macro.
Signed-off-by: NDavid Rientjes <rientjes@cs.washington.edu>
Signed-off-by: NAndi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>

3529833f

[PATCH] i386: Use probe_kernel_address instead of __get_user in fault paths · 11a4180c

由 Andi Kleen 提交于 12月 07, 2006

Makes the intention of the code cleaner to read and avoids
a potential deadlock on mmap_sem. Also change the types of
the arguments to not include __user because they're really
not user addresses.
Signed-off-by: NAndi Kleen <ak@suse.de>

11a4180c

A
[PATCH] i386: Use CLFLUSH instead of WBINVD in change_page_attr · 3760dd6e
由 Andi Kleen 提交于 12月 07, 2006
```
CLFLUSH is a lot faster than WBINVD so try to use that.
Signed-off-by: NAndi Kleen <ak@suse.de>
```
3760dd6e

12 10月, 2006 1 次提交

[PATCH] mm: use symbolic names instead of indices for zone initialisation · 6391af17

由 Mel Gorman 提交于 10月 11, 2006

Arch-independent zone-sizing is using indices instead of symbolic names to
offset within an array related to zones (max_zone_pfns). The unintended
impact is that ZONE_DMA and ZONE_NORMAL is initialised on powerpc instead
of ZONE_DMA and ZONE_HIGHMEM when CONFIG_HIGHMEM is set. As a result, the
the machine fails to boot but will boot with CONFIG_HIGHMEM turned off.

The following patch properly initialises the max_zone_pfns[] array and uses
symbolic names instead of indices in each architecture using
arch-independent zone-sizing. Two users have successfully booted their
powerpcs with it (one an ibook G4). It has also been boot tested on x86,
x86_64, ppc64 and ia64. Please merge for 2.6.19-rc2.

Credit to Benjamin Herrenschmidt for identifying the bug and rolling the
first fix. Additional credit to Johannes Berg and Andreas Schwab for
reporting the problem and testing on powerpc.
Signed-off-by: NMel Gorman <mel@csn.ul.ie>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

6391af17

04 10月, 2006 1 次提交

BUG_ON cleanups in arch/i386 · 8d8f3cbe

由 Eric Sesterhenn 提交于 10月 03, 2006

This changes a couple of if() BUG(); constructs to
BUG_ON(); so it can be safely optimized away.
Signed-off-by: NEric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: NAdrian Bunk <bunk@stusta.de>

8d8f3cbe

01 10月, 2006 3 次提交

[PATCH] paravirt: update pte hook · 789e6ac0

由 Zachary Amsden 提交于 9月 30, 2006

Add a pte_update_hook which notifies about pte changes that have been made
without using the set_pte / clear_pte interfaces.  This allows shadow mode
hypervisors which do not trap on page table access to maintain synchronized
shadows.

It also turns out, there was one pte update in PAE mode that wasn't using any
accessor interface at all for setting NX protection.  Considering it is PAE
specific, and the accessor is i386 specific, I didn't want to add a generic
encapsulation of this behavior yet.
Signed-off-by: NZachary Amsden <zach@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

789e6ac0

[PATCH] paravirt: kpte flush · 23002d88

由 Zachary Amsden 提交于 9月 30, 2006

Create a new PTE function which combines clearing a kernel PTE with the
subsequent flush. This allows the two to be easily combined into a single
hypercall or paravirt-op. More subtly, reverse the order of the flush for
kmap_atomic. Instead of flushing on establishing a mapping, flush on clearing
a mapping. This eliminates the possibility of leaving stale kmap entries
which may still have valid TLB mappings. This is required for direct mode
hypervisors, which need to reprotect all mappings of a given page when
changing the page type from a normal page to a protected page (such as a page
table or descriptor table page). But it also provides some nicer semantics
for real hardware, by providing extra debug-proofing against using stale
mappings, as well as ensuring that no stale mappings exist when changing the
cacheability attributes of a page, which could lead to cache conflicts when
two different types of mappings exist for the same page.
Signed-off-by: NZachary Amsden <zach@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

23002d88

[PATCH] Generic ioremap_page_range: i386 conversion · a148ecfd

由 Haavard Skinnemoen 提交于 9月 30, 2006

Convert i386 to use generic ioremap_page_range()

[bunk@stusta.de: build fix]
Signed-off-by: NHaavard Skinnemoen <hskinnemoen@atmel.com>
Acked-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a148ecfd

30 9月, 2006 2 次提交

[PATCH] pidspace: is_init() · f400e198

由 Sukadev Bhattiprolu 提交于 9月 29, 2006

This is an updated version of Eric Biederman's is_init() patch.
(http://lkml.org/lkml/2006/2/6/280).  It applies cleanly to 2.6.18-rc3 and
replaces a few more instances of ->pid == 1 with is_init().

Further, is_init() checks pid and thus removes dependency on Eric's other
patches for now.

Eric's original description:

	There are a lot of places in the kernel where we test for init
	because we give it special properties.  Most  significantly init
	must not die.  This results in code all over the kernel test
	->pid == 1.

	Introduce is_init to capture this case.

	With multiple pid spaces for all of the cases affected we are
	looking for only the first process on the system, not some other
	process that has pid == 1.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: <lxc-devel@lists.sourceforge.net>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

f400e198

[PATCH] make PROT_WRITE imply PROT_READ · df67b3da

由 Jason Baron 提交于 9月 29, 2006

Make PROT_WRITE imply PROT_READ for a number of architectures which don't
support write only in hardware.

While looking at this, I noticed that some architectures which do not
support write only mappings already take the exact same approach.  For
example, in arch/alpha/mm/fault.c:

"
        if (cause < 0) {
                if (!(vma->vm_flags & VM_EXEC))
                        goto bad_area;
        } else if (!cause) {
                /* Allow reads even for write-only mappings */
                if (!(vma->vm_flags & (VM_READ | VM_WRITE)))
                        goto bad_area;
        } else {
                if (!(vma->vm_flags & VM_WRITE))
                        goto bad_area;
        }
"

Thus, this patch brings other architectures which do not support write only
mappings in-line and consistent with the rest.  I've verified the patch on
ia64, x86_64 and x86.

Additional discussion:

Several architectures, including x86, can not support write-only mappings.
The pte for x86 reserves a single bit for protection and its two states are
read only or read/write.  Thus, write only is not supported in h/w.

Currently, if i 'mmap' a page write-only, the first read attempt on that page
creates a page fault and will SEGV.  That check is enforced in
arch/blah/mm/fault.c.  However, if i first write that page it will fault in
and the pte will be set to read/write.  Thus, any subsequent reads to the page
will succeed.  It is this inconsistency in behavior that this patch is
attempting to address.  Furthermore, if the page is swapped out, and then
brought back the first read will also cause a SEGV.  Thus, any arbitrary read
on a page can potentially result in a SEGV.

According to the SuSv3 spec, "if the application requests only PROT_WRITE, the
implementation may also allow read access." Also as mentioned, some
archtectures, such as alpha, shown above already take the approach that i am
suggesting.

The counter-argument to this raised by Arjan, is that the kernel is enforcing
the write only mapping the best it can given the h/w limitations.  This is
true, however Alan Cox, and myself would argue that the inconsitency in
behavior, that is applications can sometimes work/sometimes fails is highly
undesireable.  If you read through the thread, i think people, came to an
agreement on the last patch i posted, as nobody has objected to it...
Signed-off-by: NJason Baron <jbaron@redhat.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: NAndi Kleen <ak@muc.de>
Acked-by: NAlan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: NPaul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Ian Molton <spyro@f2s.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

df67b3da

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功