提交 · 24d335ca3606b610ec69c66a1e42760c96d89470 · openeuler / raspberrypi-kernel

24 2月, 2013 1 次提交

memory-hotplug: introduce new arch_remove_memory() for removing page table · 24d335ca

由 Wen Congyang 提交于 2月 22, 2013

For removing memory, we need to remove page tables.  But it depends on
architecture.  So the patch introduce arch_remove_memory() for removing
page table.  Now it only calls __remove_pages().

Note: __remove_pages() for some archtecuture is not implemented
      (I don't know how to implement it for s390).
Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Wu Jianguo <wujianguo@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

24d335ca

18 2月, 2013 1 次提交

x86/memtest: Shorten time for tests · 20bf062c

由 Alexander Holler 提交于 2月 16, 2013

By just reversing the order memtest is using the test patterns,
an additional round to zero the memory is not necessary.

This might save up to a second or even more for setups which are
doing tests on every boot.
Signed-off-by: NAlexander Holler <holler@ahsoftware.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1361029097-8308-1-git-send-email-holler@ahsoftware.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

20bf062c

13 2月, 2013 1 次提交

x86/mm: Check if PUD is large when validating a kernel address · 0ee364eb

由 Mel Gorman 提交于 2月 11, 2013

A user reported the following oops when a backup process reads
/proc/kcore:

 BUG: unable to handle kernel paging request at ffffbb00ff33b000
 IP: [<ffffffff8103157e>] kern_addr_valid+0xbe/0x110
 [...]

 Call Trace:
  [<ffffffff811b8aaa>] read_kcore+0x17a/0x370
  [<ffffffff811ad847>] proc_reg_read+0x77/0xc0
  [<ffffffff81151687>] vfs_read+0xc7/0x130
  [<ffffffff811517f3>] sys_read+0x53/0xa0
  [<ffffffff81449692>] system_call_fastpath+0x16/0x1b

Investigation determined that the bug triggered when reading
system RAM at the 4G mark. On this system, that was the first
address using 1G pages for the virt->phys direct mapping so the
PUD is pointing to a physical address, not a PMD page.

The problem is that the page table walker in kern_addr_valid() is
not checking pud_large() and treats the physical address as if
it was a PMD.  If it happens to look like pmd_none then it'll
silently fail, probably returning zeros instead of real data. If
the data happens to look like a present PMD though, it will be
walked resulting in the oops above.

This patch adds the necessary pud_large() check.

Unfortunately the problem was not readily reproducible and now
they are running the backup program without accessing
/proc/kcore so the patch has not been validated but I think it
makes sense.
Signed-off-by: NMel Gorman <mgorman@suse.de>
Reviewed-by: NRik van Riel <riel@redhat.coM>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: stable@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20130211145236.GX21389@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

0ee364eb

08 2月, 2013 1 次提交

x86: Do not leak kernel page mapping locations · e575a86f

由 Kees Cook 提交于 2月 07, 2013

Without this patch, it is trivial to determine kernel page
mappings by examining the error code reported to dmesg[1].
Instead, declare the entire kernel memory space as a violation
of a present page.

Additionally, since show_unhandled_signals is enabled by
default, switch branch hinting to the more realistic
expectation, and unobfuscate the setting of the PF_PROT bit to
improve readability.

[1] http://vulnfactory.org/blog/2013/02/06/a-linux-memory-trick/Reported-by: NDan Rosenberg <dan.j.rosenberg@gmail.com>
Suggested-by: NBrad Spengler <spender@grsecurity.net>
Signed-off-by: NKees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20130207174413.GA12485@www.outflux.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

e575a86f

01 2月, 2013 2 次提交

x86-32, mm: Remove reference to alloc_remap() · 07f4207a

由 H. Peter Anvin 提交于 1月 31, 2013

We have removed the remap allocator for x86-32, and x86-64 never had
it (and doesn't need it).  Remove residual reference to it.
Reported-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/CAE9FiQVn6_QZi3fNQ-JHYiR-7jeDJ5hT0SyT_%2BzVvfOj=PzF3w@mail.gmail.com

07f4207a

x86-32, mm: Rip out x86_32 NUMA remapping code · f03574f2

由 Dave Hansen 提交于 1月 30, 2013

This code was an optimization for 32-bit NUMA systems.

It has probably been the cause of a number of subtle bugs over
the years, although the conditions to excite them would have
been hard to trigger.  Essentially, we remap part of the kernel
linear mapping area, and then sometimes part of that area gets
freed back in to the bootmem allocator.  If those pages get
used by kernel data structures (say mem_map[] or a dentry),
there's no big deal.  But, if anyone ever tried to use the
linear mapping for these pages _and_ cared about their physical
address, bad things happen.

For instance, say you passed __GFP_ZERO to the page allocator
and then happened to get handed one of these pages, it zero the
remapped page, but it would make a pte to the _old_ page.
There are probably a hundred other ways that it could screw
with things.

We don't need to hang on to performance optimizations for
these old boxes any more.  All my 32-bit NUMA systems are long
dead and buried, and I probably had access to more than most
people.

This code is causing real things to break today:

	https://lkml.org/lkml/2013/1/9/376

I looked in to actually fixing this, but it requires surgery
to way too much brittle code, as well as stuff like
per_cpu_ptr_to_phys().

[ hpa: Cc: this for -stable, since it is a memory corruption issue.
  However, an alternative is to simply mark NUMA as depends BROKEN
  rather than EXPERIMENTAL in the X86_32 subclause... ]

Link: http://lkml.kernel.org/r/20130131005616.1C79F411@kernel.stglabs.ibm.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>

f03574f2

31 1月, 2013 1 次提交

x86/numa: Use __pa_nodebug() instead · 1e9209ed

由 Borislav Petkov 提交于 1月 27, 2013

... and fix the following warning:

  arch/x86/mm/numa.c: In function ‘setup_node_data’:
  arch/x86/mm/numa.c:222:3: warning: passing argument 1 of ‘__phys_addr_nodebug’ makes integer from pointer without a cast
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NDave Hansen <dave@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1359245901-8512-1-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

1e9209ed

30 1月, 2013 7 次提交

x86, 64bit, mm: Mark data/bss/brk to nx · 72212675

由 Yinghai Lu 提交于 1月 24, 2013

HPA said, we should not have RW and +x set at the time.

for kernel layout:
[    0.000000] Kernel Layout:
[    0.000000]   .text: [0x01000000-0x021434f8]
[    0.000000] .rodata: [0x02200000-0x02a13fff]
[    0.000000]   .data: [0x02c00000-0x02dc763f]
[    0.000000]   .init: [0x02dc9000-0x0312cfff]
[    0.000000]    .bss: [0x0313b000-0x03dd6fff]
[    0.000000]    .brk: [0x03dd7000-0x03dfffff]

before the patch, we have
---[ High Kernel Mapping ]---
0xffffffff80000000-0xffffffff81000000          16M                           pmd
0xffffffff81000000-0xffffffff82200000          18M     ro         PSE GLB x  pmd
0xffffffff82200000-0xffffffff82c00000          10M     ro         PSE GLB NX pmd
0xffffffff82c00000-0xffffffff82dc9000        1828K     RW             GLB x  pte
0xffffffff82dc9000-0xffffffff82e00000         220K     RW             GLB NX pte
0xffffffff82e00000-0xffffffff83000000           2M     RW         PSE GLB NX pmd
0xffffffff83000000-0xffffffff8313a000        1256K     RW             GLB NX pte
0xffffffff8313a000-0xffffffff83200000         792K     RW             GLB x  pte
0xffffffff83200000-0xffffffff83e00000          12M     RW         PSE GLB x  pmd
0xffffffff83e00000-0xffffffffa0000000         450M                           pmd

after patch,, we get
---[ High Kernel Mapping ]---
0xffffffff80000000-0xffffffff81000000          16M                           pmd
0xffffffff81000000-0xffffffff82200000          18M     ro         PSE GLB x  pmd
0xffffffff82200000-0xffffffff82c00000          10M     ro         PSE GLB NX pmd
0xffffffff82c00000-0xffffffff82e00000           2M     RW             GLB NX pte
0xffffffff82e00000-0xffffffff83000000           2M     RW         PSE GLB NX pmd
0xffffffff83000000-0xffffffff83200000           2M     RW             GLB NX pte
0xffffffff83200000-0xffffffff83e00000          12M     RW         PSE GLB NX pmd
0xffffffff83e00000-0xffffffffa0000000         450M                           pmd

so data, bss, brk get NX ...
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-33-git-send-email-yinghai@kernel.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

72212675

x86, kexec, 64bit: Only set ident mapping for ram. · 0e691cf8

由 Yinghai Lu 提交于 1月 24, 2013

We should set mappings only for usable memory ranges under max_pfn
Otherwise causes same problem that is fixed by

	x86, mm: Only direct map addresses that are marked as E820_RAM

This patch exposes pfn_mapped array, and only sets ident mapping for ranges
in that array.

This patch relies on new kernel_ident_mapping_init that could handle existing
pgd/pud between different calls.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-25-git-send-email-yinghai@kernel.org
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

0e691cf8

x86, 64bit: Don't set max_pfn_mapped wrong value early on native path · 10054230

由 Yinghai Lu 提交于 1月 24, 2013

We are not having max_pfn_mapped set correctly until init_memory_mapping.
So don't print its initial value for 64bit

Also need to use KERNEL_IMAGE_SIZE directly for highmap cleanup.

-v2: update comments about max_pfn_mapped according to Stefano Stabellini.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-14-git-send-email-yinghai@kernel.orgAcked-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

10054230

x86, 64bit: Use a #PF handler to materialize early mappings on demand · 8170e6be

由 H. Peter Anvin 提交于 1月 24, 2013

Linear mode (CR0.PG = 0) is mutually exclusive with 64-bit mode; all
64-bit code has to use page tables.  This makes it awkward before we
have first set up properly all-covering page tables to access objects
that are outside the static kernel range.

So far we have dealt with that simply by mapping a fixed amount of
low memory, but that fails in at least two upcoming use cases:

1. We will support load and run kernel, struct boot_params, ramdisk,
   command line, etc. above the 4 GiB mark.
2. need to access ramdisk early to get microcode to update that as
   early possible.

We could use early_iomap to access them too, but it will make code to
messy and hard to be unified with 32 bit.

Hence, set up a #PF table and use a fixed number of buffers to set up
page tables on demand.  If the buffers fill up then we simply flush
them and start over.  These buffers are all in __initdata, so it does
not increase RAM usage at runtime.

Thus, with the help of the #PF handler, we can set the final kernel
mapping from blank, and switch to init_level4_pgt later.

During the switchover in head_64.S, before #PF handler is available,
we use three pages to handle kernel crossing 1G, 512G boundaries with
sharing page by playing games with page aliasing: the same page is
mapped twice in the higher-level tables with appropriate wraparound.
The kernel region itself will be properly mapped; other mappings may
be spurious.

early_make_pgtable is using kernel high mapping address to access pages
to set page table.

-v4: Add phys_base offset to make kexec happy, and add
	init_mapping_kernel()   - Yinghai
-v5: fix compiling with xen, and add back ident level3 and level2 for xen
     also move back init_level4_pgt from BSS to DATA again.
     because we have to clear it anyway.  - Yinghai
-v6: switch to init_level4_pgt in init_mem_mapping. - Yinghai
-v7: remove not needed clear_page for init_level4_page
     it is with fill 512,8,0 already in head_64.S  - Yinghai
-v8: we need to keep that handler alive until init_mem_mapping and don't
     let early_trap_init to trash that early #PF handler.
     So split early_trap_pf_init out and move it down. - Yinghai
-v9: switchover only cover kernel space instead of 1G so could avoid
     touch possible mem holes. - Yinghai
-v11: change far jmp back to far return to initial_code, that is needed
     to fix failure that is reported by Konrad on AMD systems.  - Yinghai
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-12-git-send-email-yinghai@kernel.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

8170e6be

x86, 64bit, mm: Add generic kernel/ident mapping helper · aece2785

由 Yinghai Lu 提交于 1月 24, 2013

It is simple version for kernel_physical_mapping_init.
it will work to build one page table that will be used later.

Use mapping_info to control
        1. alloc_pg_page method
        2. if PMD is EXEC,
        3. if pgd is with kernel low mapping or ident mapping.

Will use to replace some local versions in kexec, hibernation and etc.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-8-git-send-email-yinghai@kernel.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

aece2785

x86, 64bit, mm: Make pgd next calculation consistent with pud/pmd · c2bdee59

由 Yinghai Lu 提交于 1月 24, 2013

Just like the way we calculate next for pud and pmd, aka round down and
add size.

Also, do not do boundary-checking with 'next', and just pass 'end' down
to phys_pud_init() instead. Because the loop in phys_pud_init() stops at
PTRS_PER_PUD and thus can handle a possibly bigger 'end' properly.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-6-git-send-email-yinghai@kernel.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

c2bdee59

x86, mm: Fix page table early allocation offset checking · c9b3234a

由 Yinghai Lu 提交于 1月 24, 2013

During debugging loading kernel above 4G, found that one page is not used
in pre-allocated BRK area for early page allocation.
pgt_buf_top is address that can not be used, so should check if that new
end is above that top, otherwise last page will not be used.

Fix that checking and also add print out for allocation from pre-allocated
BRK area to catch possible bugs later.

But after we get back that page for pgt, it tiggers one bug in pgt allocation
with xen: We need to avoid to use page as pgt to map range that is
overlapping with that pgt page.

Add checking about overlapping, when it happens, use memblock allocation
instead. That fixes crash on Xen PV guest with 2G that Stefan found.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-2-git-send-email-yinghai@kernel.orgAcked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Tested-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

c9b3234a

26 1月, 2013 3 次提交

x86, mm: Create slow_virt_to_phys() · d7656534

由 Dave Hansen 提交于 1月 22, 2013

This is necessary because __pa() does not work on some kinds of
memory, like vmalloc() or the alloc_remap() areas on 32-bit
NUMA systems.  We have some functions to do conversions _like_
this in the vmalloc() code (like vmalloc_to_page()), but they
do not work on sizes other than 4k pages.  We would potentially
need to be able to handle all the page sizes that we use for
the kernel linear mapping (4k, 2M, 1G).

In practice, on 32-bit NUMA systems, the percpu areas get stuck
in the alloc_remap() area.  Any __pa() call on them will break
and basically return garbage.

This patch introduces a new function slow_virt_to_phys(), which
walks the kernel page tables on x86 and should do precisely
the same logical thing as __pa(), but actually work on a wider
range of memory.  It should work on the normal linear mapping,
vmalloc(), kmap(), etc...
Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20130122212433.4D1FCA62@kernel.stglabs.ibm.comAcked-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

d7656534

x86, mm: Use new pagetable helpers in try_preserve_large_page() · f3c4fbb6

由 Dave Hansen 提交于 1月 22, 2013

try_preserve_large_page() can be slightly simplified by using
the new page_level_*() helpers. This also moves the 'level'
over to the new pg_level enum type.
Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20130122212432.14F3D993@kernel.stglabs.ibm.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

f3c4fbb6

x86, mm: Make DEBUG_VIRTUAL work earlier in boot · a25b9316

由 Dave Hansen 提交于 1月 22, 2013

The KVM code has some repeated bugs in it around use of __pa() on
per-cpu data.  Those data are not in an area on which using
__pa() is valid.  However, they are also called early enough in
boot that __vmalloc_start_set is not set, and thus the
CONFIG_DEBUG_VIRTUAL debugging does not catch them.

This adds a check to also verify __pa() calls against max_low_pfn,
which we can use earler in boot than is_vmalloc_addr().  However,
if we are super-early in boot, max_low_pfn=0 and this will trip
on every call, so also make sure that max_low_pfn is set before
we try to use it.

With this patch applied, CONFIG_DEBUG_VIRTUAL will actually
catch the bug I was chasing (and fix later in this series).

I'd love to find a generic way so that any __pa() call on percpu
areas could do a BUG_ON(), but there don't appear to be any nice
and easy ways to check if an address is a percpu one.  Anybody
have ideas on a way to do this?
Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20130122212430.F46F8159@kernel.stglabs.ibm.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

a25b9316

25 1月, 2013 1 次提交

x86: Convert a few mistaken __cpuinit annotations to __init · 9611dc7a

由 Jan Beulich 提交于 11月 23, 2012

The first two are functions serving as initcalls; the SFI one is
only being called from __init code.
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/50AFB35102000078000AAECA@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

9611dc7a

24 1月, 2013 2 次提交

x86/mm: Fix the argument passed to sync_global_pgds() · f73568a0

由 Wen Congyang 提交于 12月 18, 2012

The address range of sync_global_pgds() should be [start, end],
but we pass [start, end) to this function.
Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

f73568a0

x86/srat: Simplify memory affinity init error handling · 479a99a8

由 Davidlohr Bueso 提交于 1月 08, 2013

The acpi_numa_memory_affinity_init() function can fail in
several scenarios, use a single point of error return.
Signed-off-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
Link: http://lkml.kernel.org/r/1357690721.1890.15.camel@buesod1.americas.hpqcorp.net
[ Cleaned up the label naming a bit. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

479a99a8

16 12月, 2012 2 次提交

Revert "x86-64/efi: Use EFI to deal with platform wall clock (again)" · 11520e5e

由 Linus Torvalds 提交于 12月 15, 2012

This reverts commit bd52276f ("x86-64/efi: Use EFI to deal with
platform wall clock (again)"), and the two supporting commits:

  da5a108d: "x86/kernel: remove tboot 1:1 page table creation code"

  185034e7: "x86, efi: 1:1 pagetable mapping for virtual EFI calls")

as they all depend semantically on commit 53b87cf0 ("x86, mm:
Include the entire kernel memory map in trampoline_pgd") that got
reverted earlier due to the problems it caused.

This was pointed out by Yinghai Lu, and verified by me on my Macbook Air
that uses EFI.
Pointed-out-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

11520e5e

Revert "x86, mm: Include the entire kernel memory map in trampoline_pgd" · be354f40

由 Linus Torvalds 提交于 12月 15, 2012

This reverts commit 53b87cf0.

It causes odd bootup problems on x86-64.  Markus Trippelsdorf gets a
repeatable oops, and I see a non-repeatable oops (or constant stream of
messages that scroll off too quickly to read) that seems to go away with
this commit reverted.

So we don't know exactly what is wrong with the commit, but it's
definitely problematic, and worth reverting sooner rather than later.
Bisected-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
Cc: H Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

be354f40

13 12月, 2012 2 次提交

mm, oom: remove statically defined arch functions of same name · c2d23f91

由 David Rientjes 提交于 12月 12, 2012

out_of_memory() is a globally defined function to call the oom killer.
x86, sh, and powerpc all use a function of the same name within file scope
in their respective fault.c unnecessarily.  Inline the functions into the
pagefault handlers to clean the code up.
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c2d23f91

page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states initialization · 4b0ef1fe

由 Lai Jiangshan 提交于 12月 12, 2012

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Since we introduced N_MEMORY, we update the initialization of node_states.
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: NLin Feng <linfeng@cn.fujitsu.com>
Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4b0ef1fe

12 12月, 2012 1 次提交

mm: use vm_unmapped_area() in hugetlbfs on i386 architecture · cdc17344

由 Michel Lespinasse 提交于 12月 11, 2012

Update the i386 hugetlb_get_unmapped_area function to make use of
vm_unmapped_area() instead of implementing a brute force search.

[akpm@linux-foundation.org: fix build]
Signed-off-by: NMichel Lespinasse <walken@google.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cdc17344

11 12月, 2012 2 次提交

x86: mm: drop TLB flush from ptep_set_access_flags · e4a1cc56

由 Rik van Riel 提交于 11月 06, 2012

Intel has an architectural guarantee that the TLB entry causing
a page fault gets invalidated automatically. This means
we should be able to drop the local TLB invalidation.

Because of the way other areas of the page fault code work,
chances are good that all x86 CPUs do this.  However, if
someone somewhere has an x86 CPU that does not invalidate
the TLB entry causing a page fault, this one-liner should
be easy to revert.
Signed-off-by: NRik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>

e4a1cc56

x86: mm: only do a local tlb flush in ptep_set_access_flags() · 0f9a921c

由 Rik van Riel 提交于 11月 06, 2012

The function ptep_set_access_flags() is only ever invoked to set access
flags or add write permission on a PTE.  The write bit is only ever set
together with the dirty bit.

Because we only ever upgrade a PTE, it is safe to skip flushing entries on
remote TLBs. The worst that can happen is a spurious page fault on other
CPUs, which would flush that TLB entry.

Lazily letting another CPU incur a spurious page fault occasionally is
(much!) cheaper than aggressively flushing everybody else's TLB.
Signed-off-by: NRik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ingo Molnar <mingo@kernel.org>

0f9a921c

06 12月, 2012 1 次提交

propagate name change to comments in kernel source · 6d49e352

由 Nadia Yvette Chambers 提交于 12月 06, 2012

I've legally changed my name with New York State, the US Social Security
Administration, et al. This patch propagates the name change and change
in initials and login to comments in the kernel source as well.
Signed-off-by: NNadia Yvette Chambers <nyc@holomorphy.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

6d49e352

01 12月, 2012 1 次提交

context_tracking: New context tracking susbsystem · 91d1aa43

由 Frederic Weisbecker 提交于 11月 27, 2012

Create a new subsystem that probes on kernel boundaries
to keep track of the transitions between level contexts
with two basic initial contexts: user or kernel.

This is an abstraction of some RCU code that use such tracking
to implement its userspace extended quiescent state.

We need to pull this up from RCU into this new level of indirection
because this tracking is also going to be used to implement an "on
demand" generic virtual cputime accounting. A necessary step to
shutdown the tick while still accounting the cputime.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Gilad Ben-Yossef <gilad@benyossef.com>
Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
[ paulmck: fix whitespace error and email address. ]
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

91d1aa43

30 11月, 2012 2 次提交

x86, 386 removal: Remove CONFIG_X86_WP_WORKS_OK · a5c2a893

由 H. Peter Anvin 提交于 11月 28, 2012

All 486+ CPUs support WP in supervisor mode, so remove the fallback
386 support code.
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1354132230-21854-7-git-send-email-hpa@linux.intel.com

a5c2a893

x86, 386 removal: Remove CONFIG_INVLPG · 094ab1db

由 H. Peter Anvin 提交于 11月 28, 2012

All 486+ CPUs support INVLPG, so remove the fallback 386 support
code.
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1354132230-21854-6-git-send-email-hpa@linux.intel.com

094ab1db

23 11月, 2012 1 次提交

x86/mm: Don't flush the TLB on #WP pmd fixups · 5e4bf1a5

由 Ingo Molnar 提交于 11月 20, 2012

If we have a write protection #PF and fix up the pmd then the
hugetlb code [the only user of pmdp_set_access_flags], in its
do_huge_pmd_wp_page() page fault resolution function calls
pmdp_set_access_flags() to mark the pmd permissive again,
and flushes the TLB.

This TLB flush is unnecessary: a flush on #PF is guaranteed on
most (all?) x86 CPUs, and even in the worst-case we'll generate
a spurious fault.

So remove it.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Turner <pjt@google.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Link: http://lkml.kernel.org/r/20121120120251.GA15742@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

5e4bf1a5

18 11月, 2012 8 次提交

x86, mm: kill numa_free_all_bootmem() · 94b43c3d

由 Yinghai Lu 提交于 11月 16, 2012

Now NO_BOOTMEM version free_all_bootmem_node() does not really
do free_bootmem at all, and it only call register_page_bootmem_info_node
instead.

That is confusing, try to kill that free_all_bootmem_node().

Before that, this patch will remove numa_free_all_bootmem().

That function could be replaced with register_page_bootmem_info() and
free_all_bootmem();
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-43-git-send-email-yinghai@kernel.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

94b43c3d

x86, mm: Use clamp_t() in init_range_memory_mapping · b8fd39c0

由 Yinghai Lu 提交于 11月 16, 2012

save some lines, and make code more readable.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-42-git-send-email-yinghai@kernel.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

b8fd39c0

x86, mm: Move after_bootmem to mm_internel.h · 60a8f428

由 Yinghai Lu 提交于 11月 16, 2012

it is only used in arch/x86/mm/init*.c
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-41-git-send-email-yinghai@kernel.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

60a8f428

x86, mm: Unifying after_bootmem for 32bit and 64bit · 4e37a890

由 Yinghai Lu 提交于 11月 16, 2012

after_bootmem has different meaning in 32bit and 64bit.
        32bit: after bootmem is ready
        64bit: after bootmem is distroyed
Let's merget them make 32bit the same as 64bit.

for 32bit, it is mixing alloc_bootmem_pages, and alloc_low_page under
after_bootmem is set or not set.

alloc_bootmem is just wrapper for memblock for x86.

Now we have alloc_low_page() with memblock too. We can drop bootmem path
now, and only alloc_low_page only.

At the same time, we make alloc_low_page could handle real after_bootmem
for 32bit, because alloc_bootmem_pages could fallback to use slab too.

At last move after_bootmem set position for 32bit the same as 64bit.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-40-git-send-email-yinghai@kernel.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

4e37a890