提交 · d040c1614c24162adc3fe106b182596999264e26 · openeuler / raspberrypi-kernel

13 2月, 2009 1 次提交

x86: CPA avoid repeated lazy mmu flush · 7ad9de6a

由 Thomas Gleixner 提交于 2月 12, 2009

Impact: Flush the lazy MMU only once

Pending mmu updates only need to be flushed once to bring the
in-memory pagetable state up to date.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

7ad9de6a

12 2月, 2009 6 次提交

x86, 32-bit: refactor find_low_pfn_range() · d88316c2

由 Ingo Molnar 提交于 2月 12, 2009

Impact: cleanup

Make the max_low_pfn logic a bit more standard between
lowmem_pfn_init() and highmem_pfn_init().
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d88316c2

x86, 32-bit: clean up find_low_pfn_range() · 4769843b

由 Ingo Molnar 提交于 2月 12, 2009

Impact: cleanup

Split find_low_pfn_range() into two functions:

 - lowmem_pfn_init()
 - highmem_pfn_init()

The former gets called if all of RAM fits into lowmem,
otherwise we call highmem_pfn_init().
Signed-off-by: NIngo Molnar <mingo@elte.hu>

4769843b

I
x86: fix warning in find_low_pfn_range() · 3023533d
由 Ingo Molnar 提交于 2月 12, 2009
```
Signed-off-by: NIngo Molnar <mingo@elte.hu>
```
3023533d

x86, pat: fix warn_on_once() while mapping 0-1MB range with /dev/mem · be03d9e8

由 Suresh Siddha 提交于 2月 11, 2009

Jeff Mahoney reported:

> With Suse's hwinfo tool, on -tip:
> WARNING: at arch/x86/mm/pat.c:637 reserve_pfn_range+0x5b/0x26d()

reserve_pfn_range() is not tracking the memory range below 1MB
as non-RAM and as such is inconsistent with similar checks in
reserve_memtype() and free_memtype()

Rename the pagerange_is_ram() to pat_pagerange_is_ram() and add the
"track legacy 1MB region as non RAM" condition.

And also, fix reserve_pfn_range() to return -EINVAL, when the pfn
range is RAM. This is to be consistent with this API design.
Reported-and-tested-by: NJeff Mahoney <jeffm@suse.com>
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

be03d9e8

x86/cpa: make sure cpa is safe to call in lazy mmu mode · 4f06b043

由 Jeremy Fitzhardinge 提交于 2月 11, 2009

Impact: fix race leading to crash under KVM and Xen

The CPA code may be called while we're in lazy mmu update mode - for
example, when using DEBUG_PAGE_ALLOC and doing a slab allocation
in an interrupt handler which interrupted a lazy mmu update.  In this
case, the in-memory pagetable state may be out of date due to pending
queued updates.  We need to flush any pending updates before inspecting
the page table.  Similarly, we must explicitly flush any modifications
CPA may have made (which comes down to flushing queued operations when
flushing the TLB).
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: NMarcelo Tosatti <mtosatti@redhat.com>
Cc: Stable Kernel <stable@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

4f06b043

x86: mm/init_32.c fix compilation warning · 7651194f

由 Jaswinder Singh Rajput 提交于 2月 11, 2009

arch/x86/mm/init_32.c: In function ‘find_low_pfn_range’:
arch/x86/mm/init_32.c:696: warning: format ‘%u’ expects type ‘unsigned int’, but
Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7651194f

09 2月, 2009 1 次提交

x86: fix abuse of per_cpu_offset · 44581a28

由 Brian Gerst 提交于 2月 08, 2009

Impact: bug fix

Don't use per_cpu_offset() to determine if it valid to access a
per-cpu variable for a given cpu number.  It is not a valid assumption
on x86-64 anymore. Use cpu_possible() instead.
Signed-off-by: NBrian Gerst <brgerst@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

44581a28

06 2月, 2009 1 次提交

prevent kprobes from catching spurious page faults · 9be260a6

由 Masami Hiramatsu 提交于 2月 05, 2009

Prevent kprobes from catching spurious faults which will cause infinite
recursive page-fault and memory corruption by stack overflow.
Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
Cc: <stable@kernel.org>		[2.6.28.x]
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9be260a6

05 2月, 2009 1 次提交

x86: mm: introduce helper function in fault.c · 0973a06c

由 Hiroshi Shimamoto 提交于 2月 04, 2009

Impact: cleanup

Introduce helper function fault_in_kernel_address() to make editors happy.
Signed-off-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

0973a06c

31 1月, 2009 1 次提交
- I
  x86: update copyrights · 8f47e163
  由 Ingo Molnar 提交于 1月 31, 2009
```
Signed-off-by: NIngo Molnar <mingo@elte.hu>
```
  8f47e163
29 1月, 2009 4 次提交

x86: add might_sleep() to do_page_fault() · 01006074

由 Peter Zijlstra 提交于 1月 29, 2009

Impact: widen debug checks

VirtualBox calls do_page_fault() from an atomic context but runs into a
might_sleep() way pas this point, cure that.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

01006074

x86: replace CONFIG_X86_SMP with CONFIG_SMP · 3e5095d1

由 Ingo Molnar 提交于 1月 27, 2009

The x86/Voyager subarch used to have this distinction between
 'x86 SMP support' and 'Voyager SMP support':

 config X86_SMP
	bool
	depends on SMP && ((X86_32 && !X86_VOYAGER) || X86_64)

This is a pointless distinction - Voyager can (and already does) use
smp_ops to implement various SMP quirks it has - and it can be extended
more to cover all the specialities of Voyager.

So remove this complication in the Kconfig space.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3e5095d1

x86, smp: remove mach_ipi.h · d53e2f28

由 Ingo Molnar 提交于 1月 28, 2009

Move mach_ipi.h definitions into genapic.h.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d53e2f28

x86, apic: untangle the send_IPI_*() jungle · dac5f412

由 Ingo Molnar 提交于 1月 28, 2009

Our send_IPI_*() methods and definitions are a twisted mess: the same
symbol is defined to different things depending on .config details,
in a non-transparent way.

 - spread out the quirks into separately named per apic driver methods

 - prefix the standard PC methods with default_

 - get rid of wrapper macro obfuscation

 - clean up various details
Signed-off-by: NIngo Molnar <mingo@elte.hu>

dac5f412

27 1月, 2009 1 次提交

x86: move 64-bit NUMA code · 6470aff6

由 Brian Gerst 提交于 1月 27, 2009

Impact: Code movement, no functional change.

Move the 64-bit NUMA code from setup_percpu.c to numa_64.c
Signed-off-by: NBrian Gerst <brgerst@gmail.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

6470aff6

26 1月, 2009 1 次提交

x86: work around PAGE_KERNEL_WC not getting WC in iomap_atomic_prot_pfn. · ef5fa0ab

由 Eric Anholt 提交于 1月 23, 2009

In the absence of PAT, PAGE_KERNEL_WC ends up mapping to a memory type that
gets UC behavior even in the presence of a WC MTRR covering the area in
question.  By swapping to PAGE_KERNEL_UC_MINUS, we can get the actual
behavior the caller wanted (WC if you can manage it, UC otherwise).

This recovers the 40% performance improvement of using WC in the DRM
to upload vertex data.
Signed-off-by: NEric Anholt <eric@anholt.net>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

ef5fa0ab

24 1月, 2009 2 次提交

x86: handle PAT more like other CPU features · 75a04811

由 H. Peter Anvin 提交于 1月 22, 2009

Impact: Cleanup

When PAT was originally introduced, it was handled specially for a few
reasons:

- PAT bugs are hard to track down, so we wanted to maintain a
  whitelist of CPUs.
- The i386 and x86-64 CPUID code was not yet unified.

Both of these are now obsolete, so handle PAT like any other features,
including ordinary feature blacklisting due to known bugs.
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

75a04811

x86: uaccess: introduce try and catch framework · fe40c0af

由 Hiroshi Shimamoto 提交于 1月 23, 2009

Impact: introduce new uaccess exception handling framework

Introduce {get|put}_user_try and {get|put}_user_catch as new uaccess exception
handling framework.
{get|put}_user_try begins exception block and {get|put}_user_catch(err) ends
the block and gets err if an exception occured in {get|put}_user_ex() in the
block. The exception is stored thread_info->uaccess_err.

The example usage of this framework is below;
int func()
{
	int err = 0;

	get_user_try {
		get_user_ex(...);
		get_user_ex(...);
		:
	} get_user_catch(err);

	return err;
}

Note: get_user_ex() is not clear the value when an exception occurs, it's
different from the behavior of __get_user(), but I think it doesn't matter.
Signed-off-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

fe40c0af

22 1月, 2009 3 次提交

x86 PAT: ioremap_wc should take resource_size_t parameter · d639bab8

由 venkatesh.pallipadi@intel.com 提交于 1月 09, 2009

Impact: fix/extend ioremap_wc() beyond 4GB aperture on 32-bit

ioremap_wc() was taking in unsigned long parameter, where as it should take
64-bit resource_size_t parameter like other ioremap variants.
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d639bab8

x86: optimise page fault entry, cleanup · fb746d0e

由 Johannes Weiner 提交于 1月 21, 2009

tsk is already assigned to current, drop the redundant second
assignment.
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

fb746d0e

x86: fix PTE corruption issue while mapping RAM using /dev/mem · 95971342

由 Suresh Siddha 提交于 1月 13, 2009

Beschorner Daniel reported:
> hwinfo problem since 2.6.28, showing this in the oops:
>	Corrupted page table at address 7fd04de3ec00

Also, PaX Team reported a regression with this commit:

>	commit 9542ada8
>	Author: Suresh Siddha <suresh.b.siddha@intel.com>
>	Date:   Wed Sep 24 08:53:33 2008 -0700
>
>	    x86: track memtype for RAM in page struct

This commit breaks mapping any RAM page through /dev/mem, as the
reserve_memtype() was not initializing the return attribute type and as such
corrupting the PTE entry that was setup with the return attribute type.

Because of this bug, application mapping this RAM page through /dev/mem
will die with "Corrupted page table at address xxxx" message in the kernel
log and also the kernel identity mapping which maps the underlying RAM
page gets converted to UC.

Fix this by initializing the return attribute type before calling
reserve_ram_pages_type()
Reported-by: NPaX Team <pageexec@freemail.hu>
Reported-and-tested-by: NBeschorner Daniel <Daniel.Beschorner@facton.com>
Tested-and-Acked-by: NPaX Team <pageexec@freemail.hu>
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

95971342

21 1月, 2009 3 次提交

x86: fix page attribute corruption with cpa() · a1e46212

由 Suresh Siddha 提交于 1月 20, 2009

Impact: fix sporadic slowdowns and warning messages

This patch fixes a performance issue reported by Linus on his
Nehalem system. While Linus reverted the PAT patch (commit
58dab916) which exposed the issue,
existing cpa() code can potentially still cause wrong(page attribute
corruption) behavior.

This patch also fixes the "WARNING: at arch/x86/mm/pageattr.c:560" that
various people reported.

In 64bit kernel, kernel identity mapping might have holes depending
on the available memory and how e820 reports the address range
covering the RAM, ACPI, PCI reserved regions. If there is a 2MB/1GB hole
in the address range that is not listed by e820 entries, kernel identity
mapping will have a corresponding hole in its 1-1 identity mapping.

If cpa() happens on the kernel identity mapping which falls into these holes,
existing code fails like this:

	__change_page_attr_set_clr()
		__change_page_attr()
			returns 0 because of if (!kpte). But doesn't
			set cpa->numpages and cpa->pfn.
		cpa_process_alias()
			uses uninitialized cpa->pfn (random value)
			which can potentially lead to changing the page
			attribute of kernel text/data, kernel identity
			mapping of RAM pages etc. oops!

This bug was easily exposed by another PAT patch which was doing
cpa() more often on kernel identity mapping holes (physical range between
max_low_pfn_mapped and 4GB), where in here it was setting the
cache disable attribute(PCD) for kernel identity mappings aswell.

Fix cpa() to handle the kernel identity mapping holes. Retain
the WARN() for cpa() calls to other not present address ranges
(kernel-text/data, ioremap() addresses)
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a1e46212

x86: uv cleanup, build fix · 4ec71fa2

由 Ingo Molnar 提交于 1月 21, 2009

Fix:

arch/x86/mm/srat_64.c: In function ‘acpi_numa_processor_affinity_init’:
arch/x86/mm/srat_64.c:141: error: implicit declaration of function ‘get_uv_system_type’
arch/x86/mm/srat_64.c:141: error: ‘UV_X2APIC’ undeclared (first use in this function)
arch/x86/mm/srat_64.c:141: error: (Each undeclared identifier is reported only once
arch/x86/mm/srat_64.c:141: error: for each function it appears in.)

A couple of UV definitions were moved to asm/uv/uv.h, but srat_64.c did
not include that header. Add it.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

4ec71fa2

x86, mm: move tlb.c to arch/x86/mm/ · 55f4949f

由 Ingo Molnar 提交于 1月 21, 2009

Impact: cleanup

Now that it's unified, move the (SMP) TLB flushing code from arch/x86/kernel/
to arch/x86/mm/, where it belongs logically.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

55f4949f

20 1月, 2009 2 次提交

x86: optimise x86's do_page_fault (C entry point for the page fault path) · 92181f19

由 Nick Piggin 提交于 1月 20, 2009

Impact: cleanup, restructure code to improve assembly

gcc isn't _all_ that smart about spilling registers to stack or reusing
stack slots, even with branch annotations. do_page_fault contained a lot
of functionality, so split unlikely paths into their own functions, and
mark them as noinline just to be sure. I consider this actually to be
somewhat of a cleanup too: the main function now contains about half
the number of lines so the normal path is easier to read, while the error
cases are also nicely split away.

Also, ensure the order of arguments to functions is always the same: regs,
addr, error_code. This can reduce code size a tiny bit, and just looks neater
too.

And add a couple of branch annotations.

Before:
  do_page_fault:
          subq    $360, %rsp      #,

After:
  do_page_fault:
          subq    $56, %rsp       #,

bloat-o-meter:
  add/remove: 8/0 grow/shrink: 0/1 up/down: 2222/-1680 (542)
  function                                     old     new   delta
  __bad_area_nosemaphore                         -     506    +506
  no_context                                     -     474    +474
  vmalloc_fault                                  -     424    +424
  spurious_fault                                 -     358    +358
  mm_fault_error                                 -     272    +272
  bad_area_access_error                          -      89     +89
  bad_area                                       -      89     +89
  bad_area_nosemaphore                           -      10     +10
  do_page_fault                               2464     784   -1680

Yes, the total size increases by 542 bytes, due to the extra function calls.
But these will very rarely be called (except for vmalloc_fault) in a normal
workload. Importantly, do_page_fault is less than 1/3rd it's original size,
and touches far less stack.

Existing gotos and branch hints did move a lot of the infrequently used text
out of the fastpath, but that's even further improved after this patch.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

92181f19

x86: remove kernel_physical_mapping_init() from init section · f5495506

由 Gary Hade 提交于 1月 19, 2009

Impact: fix crash with memory hotplug enabled

kernel_physical_mapping_init() is called during memory hotplug
so it does not belong in the init section.

If the kernel is built with CONFIG_DEBUG_SECTION_MISMATCH=y on
the make command line, arch/x86/mm/init_64.c is compiled with
the -fno-inline-functions-called-once gcc option defeating
inlining of kernel_physical_mapping_init() within init_memory_mapping().

When kernel_physical_mapping_init() is not inlined it is placed
in the .init.text section according to the __init in it's current
declaration.  A later call to kernel_physical_mapping_init() during
a memory hotplug operation encounters an int3 trap because the
.init.text section memory has been freed.

This patch eliminates the crash caused by the int3 trap by moving the
non-inlined kernel_physical_mapping_init() from .init.text to .meminit.text.
Signed-off-by: NGary Hade <garyhade@us.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f5495506

16 1月, 2009 2 次提交

x86: fix assumed to be contiguous leaf page tables for kmap_atomic region (take 2) · a3c6018e

由 Jan Beulich 提交于 1月 16, 2009

Debugging and original patch from Nick Piggin <npiggin@suse.de>

The early fixmap pmd entry inserted at the very top of the KVA is causing the
subsequent fixmap mapping code to not provide physically linear pte pages over
the kmap atomic portion of the fixmap (which relies on said property to
calculate pte addresses).

This has caused weird boot failures in kmap_atomic much later in the boot
process (initial userspace faults) on a 32-bit PAE system with a larger number
of CPUs (smaller CPU counts tend not to run over into the next page so don't
show up the problem).

Solve this by attempting to clear out the page table, and copy any of its
entries to the new one. Also, add a bug if a nonlinear condition is encountered
and can't be resolved, which might save some hours of debugging if this fragile
scheme ever breaks again...

Once we have such logic, we can also use it to eliminate the early ioremap
trickery around the page table setup for the fixmap area. This also fixes
potential issues with FIX_* entries sharing the leaf page table with the early
ioremap ones getting discarded by early_ioremap_clear() and not restored by
early_ioremap_reset(). It at once eliminates the temporary (and configuration,
namely NR_CPUS, dependent) unavailability of early fixed mappings during the
time the fixmap area page tables get constructed.

Finally, also replace the hard coded calculation of the initial table space
needed for the fixmap area with a proper one, allowing kernels configured for
large CPU counts to actually boot.

Based-on: Nick Piggin <npiggin@suse.de>
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a3c6018e

Revert "x86 PAT: remove CPA WARN_ON for zero pte" · b5db0e38

由 Linus Torvalds 提交于 1月 15, 2009

This reverts commit 58dab916, which
makes my Nehalem come to a nasty crawling almost-halt.  It looks like it
turns off caching of regular kernel RAM, with the understandable
slowdown of a few orders of magnitude as a result.
Acked-by: NIngo Molnar <mingo@elte.hu>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b5db0e38

15 1月, 2009 1 次提交

x86, pat: fix reserve_memtype() for legacy 1MB range · 5cca0cf1

由 Suresh Siddha 提交于 1月 09, 2009

Thierry Vignaud reported:
> http://bugzilla.kernel.org/show_bug.cgi?id=12372
>
> On P4 with an SiS motherboard (video card is a SiS 651)
> X server fails to start with error:
> xf86MapVidMem: Could not mmap framebuffer (0x00000000,0x2000) (Invalid
> argument)

Here X is trying to map first 8KB of memory using /dev/mem. Existing
code treats first 0-4KB of memory as non-RAM and 4KB-8KB as RAM. Recent
code changes don't allow to map memory with different attributes
at the same time.

Fix this by treating the first 1MB legacy region as special and always
track the attribute requests with in this region using linear linked
list (and don't bother if the range is RAM or non-RAM or mixed)
Reported-and-tested-by: NThierry Vignaud <tvignaud@mandriva.com>
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

5cca0cf1

14 1月, 2009 3 次提交

x86 PAT: remove CPA WARN_ON for zero pte · 58dab916

由 venkatesh.pallipadi@intel.com 提交于 1月 09, 2009

Impact: reduce scope of debug check - avoid warnings

The logic to find whether identity map exists or not using
high_memory or max_low_pfn_mapped/max_pfn_mapped are not complete
as the memory withing the range may not be mapped if there is a
unusable hole in e820.

Specifically, on my test system I started seeing these warnings with
tools like hwinfo, acpidump trying to map ACPI region.

[   27.400018] ------------[ cut here ]------------
[   27.400344] WARNING: at /home/venkip/src/linus/linux-2.6/arch/x86/mm/pageattr.c:560 __change_page_attr_set_clr+0xf3/0x8b8()
[   27.400821] Hardware name: X7DB8
[   27.401070] CPA: called for zero pte. vaddr = ffff8800cff6a000 cpa->vaddr = ffff8800cff6a000
[   27.401569] Modules linked in:
[   27.401882] Pid: 4913, comm: dmidecode Not tainted 2.6.28-05716-gfe0bdec6 #586
[   27.402141] Call Trace:
[   27.402488]  [<ffffffff80237c21>] warn_slowpath+0xd3/0x10f
[   27.402749]  [<ffffffff80274ade>] ? find_get_page+0xb3/0xc9
[   27.403028]  [<ffffffff80274a2b>] ? find_get_page+0x0/0xc9
[   27.403333]  [<ffffffff80226425>] __change_page_attr_set_clr+0xf3/0x8b8
[   27.403628]  [<ffffffff8028ec99>] ? __purge_vmap_area_lazy+0x192/0x1a1
[   27.403883]  [<ffffffff8028eb52>] ? __purge_vmap_area_lazy+0x4b/0x1a1
[   27.404172]  [<ffffffff80290268>] ? vm_unmap_aliases+0x1ab/0x1bb
[   27.404512]  [<ffffffff80290105>] ? vm_unmap_aliases+0x48/0x1bb
[   27.404766]  [<ffffffff80226d28>] change_page_attr_set_clr+0x13e/0x2e6
[   27.405026]  [<ffffffff80698fa7>] ? _spin_unlock+0x26/0x2a
[   27.405292]  [<ffffffff80227e6a>] ? reserve_memtype+0x19b/0x4e3
[   27.405590]  [<ffffffff80226ffd>] _set_memory_wb+0x22/0x24
[   27.405844]  [<ffffffff80225d28>] ioremap_change_attr+0x26/0x28
[   27.406097]  [<ffffffff80228355>] reserve_pfn_range+0x1a3/0x235
[   27.406427]  [<ffffffff80228430>] track_pfn_vma_new+0x49/0xb3
[   27.406686]  [<ffffffff80286c46>] remap_pfn_range+0x94/0x32c
[   27.406940]  [<ffffffff8022878d>] ? phys_mem_access_prot_allowed+0xb5/0x1a8
[   27.407209]  [<ffffffff803e9bf4>] mmap_mem+0x75/0x9d
[   27.407523]  [<ffffffff8028b3b4>] mmap_region+0x2cf/0x53e
[   27.407776]  [<ffffffff8028b8cc>] do_mmap_pgoff+0x2a9/0x30d
[   27.408034]  [<ffffffff8020f4a4>] sys_mmap+0x92/0xce
[   27.408339]  [<ffffffff8020b65b>] system_call_fastpath+0x16/0x1b
[   27.408614] ---[ end trace 4b16ad70c09a602d ]---
[   27.408871] dmidecode:4913 reserve_pfn_range ioremap_change_attr failed write-back for cff6a000-cff6b000

This is wih track_pfn_vma_new trying to keep identity map in sync.
The address cff6a000 is the ACPI region according to e820.

[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009c000 (usable)
[    0.000000]  BIOS-e820: 000000000009c000 - 00000000000a0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000cc000 - 00000000000d0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
[    0.000000]  BIOS-e820: 0000000000100000 - 00000000cff60000 (usable)
[    0.000000]  BIOS-e820: 00000000cff60000 - 00000000cff69000 (ACPI data)
[    0.000000]  BIOS-e820: 00000000cff69000 - 00000000cff80000 (ACPI NVS)
[    0.000000]  BIOS-e820: 00000000cff80000 - 00000000d0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
[    0.000000]  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
[    0.000000]  BIOS-e820: 00000000ff000000 - 0000000100000000 (reserved)
[    0.000000]  BIOS-e820: 0000000100000000 - 0000000230000000 (usable)

And is not mapped as per init_memory_mapping.

[    0.000000] init_memory_mapping: 0000000000000000-00000000cff60000
[    0.000000] init_memory_mapping: 0000000100000000-0000000230000000

We can add logic to check for this. But, there can also be other holes in
identity map when we have 1GB of aligned reserved space in e820.

This patch handles it by removing the WARN_ON and returning a specific
error value (EFAULT) to indicate that the address does not have any
identity mapping.

The code that tries to keep identity map in sync can ignore
this error, with other callers of cpa still getting error here.
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

58dab916

x86 PAT: return compatible mapping to remap_pfn_range callers · cdecff68

由 venkatesh.pallipadi@intel.com 提交于 1月 09, 2009

Impact: avoid warning message, potentially solve 3D performance regression

Change x86 PAT code to return compatible memtype if the exact memtype that
was requested in remap_pfn_rage and friends is not available due to some
conflict.

This is done by returning the compatible type in pgprot parameter of
track_pfn_vma_new(), and the caller uses that memtype for page table.

Note that track_pfn_vma_copy() which is basically called during fork gets the
prot from existing page table and should not have any conflict. Hence we use
strict memtype check there and do not allow compatible memtypes.

This patch fixes the bug reported here:

  http://marc.info/?l=linux-kernel&m=123108883716357&w=2

Specifically the error message:

  X:5010 map pfn expected mapping type write-back for d0000000-d0101000,
  got write-combining

Should go away.
Reported-and-bisected-by: NKevin Winchester <kjwinchester@gmail.com>
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cdecff68

x86 PAT: change track_pfn_vma_new to take pgprot_t pointer param · e4b866ed

由 venkatesh.pallipadi@intel.com 提交于 1月 09, 2009

Impact: cleanup

Change the protection parameter for track_pfn_vma_new() into a pgprot_t pointer.
Subsequent patch changes the x86 PAT handling to return a compatible
memtype in pgprot_t, if what was requested cannot be allowed due to conflicts.
No fuctionality change in this patch.
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e4b866ed

13 1月, 2009 1 次提交

x86: avoid theoretical vmalloc fault loop · f313e123

由 Andi Kleen 提交于 1月 09, 2009

Ajith Kumar noticed:

 I was going through the vmalloc fault handling for x86_64 and am unclear
 about the following lines in the vmalloc_fault() function.

 pgd = pgd_offset(current->mm ?: &init_mm, address);
 pgd_ref = pgd_offset_k(address);

 Here the intention is to get the pgd corresponding to the current process
 and sync it up with the pgd in init_mm(obtained from pgd_offset_k).
 However, for kernel threads current->mm is NULL and hence pgd =
 pgd_offset(init_mm, address) = pgd_ref which means the fault handler
 returns without setting the pgd entry in the MM structure in the context
 of which the kernel thread has faulted.  This could lead to never-ending
 faults and busy looping of kernel threads like pdflush.  So, shouldn't the
 pgd = pgd_offset(current->mm ?: &init_mm, address); be pgd =
 pgd_offset(current->active_mm ?: &init_mm, address);

We can use active_mm unconditionally because it should be always set.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f313e123

08 1月, 2009 2 次提交

trivial: replace last usages of __FUNCTION__ in kernel · 9b4778f6

由 Harvey Harrison 提交于 1月 07, 2009

__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Acked-by: NMauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9b4778f6

resource: allow MMIO exclusivity for device drivers · e8de1481

由 Arjan van de Ven 提交于 10月 22, 2008

Device drivers that use pci_request_regions() (and similar APIs) have a
reasonable expectation that they are the only ones accessing their device.
As part of the e1000e hunt, we were afraid that some userland (X or some
bootsplash stuff) was mapping the MMIO region that the driver thought it
had exclusively via /dev/mem or via various sysfs resource mappings.

This patch adds the option for device drivers to cause their reserved
regions to the "banned from /dev/mem use" list, so now both kernel memory
and device-exclusive MMIO regions are banned.
NOTE: This is only active when CONFIG_STRICT_DEVMEM is set.

In addition to the config option, a kernel parameter iomem=relaxed is
provided for the cases where developers want to diagnose, in the field,
drivers issues from userspace.
Reviewed-by: NMatthew Wilcox <willy@linux.intel.com>
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>

e8de1481

07 1月, 2009 3 次提交

x86: smp.h move zap_low_mappings declartion to tlbflush.h · dacf7333

由 Jaswinder Singh Rajput 提交于 1月 07, 2009

Impact: cleanup, moving NON-SMP stuff from smp.h
Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

dacf7333

mm: show node to memory section relationship with symlinks in sysfs · c04fc586

由 Gary Hade 提交于 1月 06, 2009

Show node to memory section relationship with symlinks in sysfs

Add /sys/devices/system/node/nodeX/memoryY symlinks for all
the memory sections located on nodeX.  For example:
/sys/devices/system/node/node1/memory135 -> ../../memory/memory135
indicates that memory section 135 resides on node1.

Also revises documentation to cover this change as well as updating
Documentation/ABI/testing/sysfs-devices-memory to include descriptions
of memory hotremove files 'phys_device', 'phys_index', and 'state'
that were previously not described there.

In addition to it always being a good policy to provide users with
the maximum possible amount of physical location information for
resources that can be hot-added and/or hot-removed, the following
are some (but likely not all) of the user benefits provided by
this change.
Immediate:
  - Provides information needed to determine the specific node
    on which a defective DIMM is located.  This will reduce system
    downtime when the node or defective DIMM is swapped out.
  - Prevents unintended onlining of a memory section that was
    previously offlined due to a defective DIMM.  This could happen
    during node hot-add when the user or node hot-add assist script
    onlines _all_ offlined sections due to user or script inability
    to identify the specific memory sections located on the hot-added
    node.  The consequences of reintroducing the defective memory
    could be ugly.
  - Provides information needed to vary the amount and distribution
    of memory on specific nodes for testing or debugging purposes.
Future:
  - Will provide information needed to identify the memory
    sections that need to be offlined prior to physical removal
    of a specific node.

Symlink creation during boot was tested on 2-node x86_64, 2-node
ppc64, and 2-node ia64 systems.  Symlink creation during physical
memory hot-add tested on a 2-node x86_64 system.
Signed-off-by: NGary Hade <garyhade@us.ibm.com>
Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c04fc586

mm: invoke oom-killer from page fault · 1c0fe6e3

由 Nick Piggin 提交于 1月 06, 2009

Rather than have the pagefault handler kill a process directly if it gets
a VM_FAULT_OOM, have it call into the OOM killer.

With increasingly sophisticated oom behaviour (cpusets, memory cgroups,
oom killing throttling, oom priority adjustment or selective disabling,
panic on oom, etc), it's silly to unconditionally kill the faulting
process at page fault time.  Create a hook for pagefault oom path to call
into instead.

Only converted x86 and uml so far.

[akpm@linux-foundation.org: make __out_of_memory() static]
[akpm@linux-foundation.org: fix comment]
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Jeff Dike <jdike@addtoit.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1c0fe6e3

06 1月, 2009 1 次提交

x86: k8 numa register active regions later · 40bcc69b

由 Yinghai Lu 提交于 1月 05, 2009

Impact: cleanup

don't register early, so we don't need to clear actived regions if it fail
to get node hash shift or wild set in nb config.

also remove nodeids array that is not needed
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

40bcc69b