提交 · 5bd5a452662bc37c54fb6828db1a3faf87e6511c · openanolis / cloud-kernel

18 11月, 2010 2 次提交

x86: Add NX protection for kernel data · 5bd5a452

由 Matthieu Castet 提交于 11月 16, 2010

This patch expands functionality of CONFIG_DEBUG_RODATA to set main
(static) kernel data area as NX.

The following steps are taken to achieve this:

 1. Linker script is adjusted so .text always starts and ends on a page bound
 2. Linker script is adjusted so .rodata always start and end on a page boundary
 3. NX is set for all pages from _etext through _end in mark_rodata_ro.
 4. free_init_pages() sets released memory NX in arch/x86/mm/init.c
 5. bios rom is set to x when pcibios is used.

The results of patch application may be observed in the diff of kernel page
table dumps:

pcibios:

 -- data_nx_pt_before.txt       2009-10-13 07:48:59.000000000 -0400
 ++ data_nx_pt_after.txt        2009-10-13 07:26:46.000000000 -0400
  0x00000000-0xc0000000           3G                           pmd
  ---[ Kernel Mapping ]---
 -0xc0000000-0xc0100000           1M     RW             GLB x  pte
 +0xc0000000-0xc00a0000         640K     RW             GLB NX pte
 +0xc00a0000-0xc0100000         384K     RW             GLB x  pte
 -0xc0100000-0xc03d7000        2908K     ro             GLB x  pte
 +0xc0100000-0xc0318000        2144K     ro             GLB x  pte
 +0xc0318000-0xc03d7000         764K     ro             GLB NX pte
 -0xc03d7000-0xc0600000        2212K     RW             GLB x  pte
 +0xc03d7000-0xc0600000        2212K     RW             GLB NX pte
  0xc0600000-0xf7a00000         884M     RW         PSE GLB NX pmd
  0xf7a00000-0xf7bfe000        2040K     RW             GLB NX pte
  0xf7bfe000-0xf7c00000           8K                           pte

No pcibios:

 -- data_nx_pt_before.txt       2009-10-13 07:48:59.000000000 -0400
 ++ data_nx_pt_after.txt        2009-10-13 07:26:46.000000000 -0400
  0x00000000-0xc0000000           3G                           pmd
  ---[ Kernel Mapping ]---
 -0xc0000000-0xc0100000           1M     RW             GLB x  pte
 +0xc0000000-0xc0100000           1M     RW             GLB NX pte
 -0xc0100000-0xc03d7000        2908K     ro             GLB x  pte
 +0xc0100000-0xc0318000        2144K     ro             GLB x  pte
 +0xc0318000-0xc03d7000         764K     ro             GLB NX pte
 -0xc03d7000-0xc0600000        2212K     RW             GLB x  pte
 +0xc03d7000-0xc0600000        2212K     RW             GLB NX pte
  0xc0600000-0xf7a00000         884M     RW         PSE GLB NX pmd
  0xf7a00000-0xf7bfe000        2040K     RW             GLB NX pte
  0xf7bfe000-0xf7c00000           8K                           pte

The patch has been originally developed for Linux 2.6.34-rc2 x86 by
Siarhei Liakh <sliakh.lkml@gmail.com> and Xuxian Jiang <jiang@cs.ncsu.edu>.

 -v1:  initial patch for 2.6.30
 -v2:  patch for 2.6.31-rc7
 -v3:  moved all code into arch/x86, adjusted credits
 -v4:  fixed ifdef, removed credits from CREDITS
 -v5:  fixed an address calculation bug in mark_nxdata_nx()
 -v6:  added acked-by and PT dump diff to commit log
 -v7:  minor adjustments for -tip
 -v8:  rework with the merge of "Set first MB as RW+NX"
Signed-off-by: NSiarhei Liakh <sliakh.lkml@gmail.com>
Signed-off-by: NXuxian Jiang <jiang@cs.ncsu.edu>
Signed-off-by: NMatthieu CASTET <castet.matthieu@free.fr>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: James Morris <jmorris@namei.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Dave Jones <davej@redhat.com>
Cc: Kees Cook <kees.cook@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <4CE2F82E.60601@free.fr>
[ minor cleanliness edits ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

5bd5a452

x86: Fix improper large page preservation · 64edc8ed

由 matthieu castet 提交于 11月 16, 2010

This patch fixes a bug in try_preserve_large_page() which may
result in improper large page preservation and improper
application of page attributes to the memory area outside of the
original change request.

More specifically, the problem manifests itself when set_memory_*()
is called for several pages at the beginning of the large page and
try_preserve_large_page() erroneously concludes that the change can
be applied to whole large page.

The fix consists of 3 parts:

  1. Addition of "required" protection attributes in
     static_protections(), so .data and .bss can be guaranteed to
     stay "RW"

  2. static_protections() is now called for every small
     page within large page to determine compatibility of new
     protection attributes (instead of just small pages within the
     requested range).

  3. Large page can be preserved only if attribute change is
     large-page-aligned and covers whole large page.

 -v1: Try_preserve_large_page() patch for Linux 2.6.34-rc2
 -v2: Replaced pfn check with address check for kernel rw-data
Signed-off-by: NSiarhei Liakh <sliakh.lkml@gmail.com>
Signed-off-by: NXuxian Jiang <jiang@cs.ncsu.edu>
Reviewed-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: James Morris <jmorris@namei.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Dave Jones <davej@redhat.com>
Cc: Kees Cook <kees.cook@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <4CE2F7F3.8030809@free.fr>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

64edc8ed

01 11月, 2010 1 次提交

x86, mm: Fix section mismatch in tlb.c · cf38d0ba

由 Rakib Mullick 提交于 11月 01, 2010

Mark tlb_cpuhp_notify as __cpuinit. It's basically a callback
function, which is called from __cpuinit init_smp_flash(). So -
it's safe.

We were warned by the following warning:

 WARNING: arch/x86/mm/built-in.o(.text+0x356d): Section mismatch
 in reference from the function tlb_cpuhp_notify() to the
 function .cpuinit.text:calculate_tlb_offset()
 The function tlb_cpuhp_notify() references
 the function __cpuinit calculate_tlb_offset().
 This is often because tlb_cpuhp_notify lacks a __cpuinit
 annotation or the annotation of calculate_tlb_offset is wrong.
Signed-off-by: NRakib Mullick <rakib.mullick@gmail.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Shaohua Li <shaohua.li@intel.com>
LKML-Reference: <AANLkTinWQRG=HA9uB3ad0KAqRRTinL6L_4iKgF84coph@mail.gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cf38d0ba

29 10月, 2010 1 次提交

x86, memblock: Fix early_node_mem with big reserved region. · 419db274

由 Yinghai Lu 提交于 10月 28, 2010

Xen can reserve huge amounts of memory for pre-ballooning, but that
still shows as RAM in the e820 memory map.  early_node_mem could not
find range because of start/end adjusting, and will go through the
fallback path.  However, the fallback patch is still using
memblock_x86_find_range_node(), and it is partially top-down because
it go through active_range entries from low to high.

Let's use memblock_find_in_range instead memblock_x86_find_range_node.
So get real top down in fallback path.

We may still need to make memblock_x86_find_range_node to do overall
top_down work.
Reported-by: NJeremy Fitzhardinge <jeremy@goop.org>
Tested-by: NJeremy Fitzhardinge <jeremy@goop.org>
Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
LKML-Reference: <4CC9A9C9.8020700@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

419db274

28 10月, 2010 2 次提交

Remove duplicate includes from many files · 61d8e11e

由 Zimny Lech 提交于 10月 27, 2010

Signed-off-by: NZimny Lech <napohybelskurwysynom2010@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

61d8e11e

mm: fix race in kunmap_atomic() · 20273941

由 Peter Zijlstra 提交于 10月 27, 2010

Christoph reported a nice splat which illustrated a race in the new stack
based kmap_atomic implementation.

The problem is that we pop our stack slot before we're completely done
resetting its state -- in particular clearing the PTE (sometimes that's
CONFIG_DEBUG_HIGHMEM).  If an interrupt happens before we actually clear
the PTE used for the last slot, that interrupt can reuse the slot in a
dirty state, which triggers a BUG in kmap_atomic().

Fix this by introducing kmap_atomic_idx() which reports the current slot
index without actually releasing it and use that to find the PTE and delay
the _pop() until after we're completely done.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Reported-by: NChristoph Hellwig <hch@infradead.org>
Acked-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

20273941

27 10月, 2010 3 次提交

x86: access_error API cleanup · 68da336a

由 Michel Lespinasse 提交于 10月 26, 2010

access_error() already takes error_code as an argument, so there is
no need for an additional write flag.
Signed-off-by: NMichel Lespinasse <walken@google.com>
Acked-by: NRik van Riel <riel@redhat.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Acked-by: NWu Fengguang <fengguang.wu@intel.com>
Cc: Ying Han <yinghan@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

68da336a

mm: retry page fault when blocking on disk transfer · d065bd81

由 Michel Lespinasse 提交于 10月 26, 2010

This change reduces mmap_sem hold times that are caused by waiting for
disk transfers when accessing file mapped VMAs.

It introduces the VM_FAULT_ALLOW_RETRY flag, which indicates that the call
site wants mmap_sem to be released if blocking on a pending disk transfer.
In that case, filemap_fault() returns the VM_FAULT_RETRY status bit and
do_page_fault() will then re-acquire mmap_sem and retry the page fault.

It is expected that the retry will hit the same page which will now be
cached, and thus it will complete with a low mmap_sem hold time.

Tests:

- microbenchmark: thread A mmaps a large file and does random read accesses
  to the mmaped area - achieves about 55 iterations/s. Thread B does
  mmap/munmap in a loop at a separate location - achieves 55 iterations/s
  before, 15000 iterations/s after.

- We are seeing related effects in some applications in house, which show
  significant performance regressions when running without this change.

[akpm@linux-foundation.org: fix warning & crash]
Signed-off-by: NMichel Lespinasse <walken@google.com>
Acked-by: NRik van Riel <riel@redhat.com>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Reviewed-by: NWu Fengguang <fengguang.wu@intel.com>
Cc: Ying Han <yinghan@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d065bd81

mm: stack based kmap_atomic() · 3e4d3af5

由 Peter Zijlstra 提交于 10月 26, 2010

Keep the current interface but ignore the KM_type and use a stack based
approach.

The advantage is that we get rid of crappy code like:

	#define __KM_PTE			\
		(in_nmi() ? KM_NMI_PTE : 	\
		 in_irq() ? KM_IRQ_PTE :	\
		 KM_PTE0)

and in general can stop worrying about what context we're in and what kmap
slots might be appropriate for that.

The downside is that FRV kmap_atomic() gets more expensive.

For now we use a CPP trick suggested by Andrew:

  #define kmap_atomic(page, args...) __kmap_atomic(page)

to avoid having to touch all kmap_atomic() users in a single patch.

[ not compiled on:
  - mn10300: the arch doesn't actually build with highmem to begin with ]

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix up drivers/gpu/drm/i915/intel_overlay.c]
Acked-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NChris Metcalf <cmetcalf@tilera.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3e4d3af5

21 10月, 2010 3 次提交

x86: Spread tlb flush vector between nodes · 93296720

由 Shaohua Li 提交于 10月 20, 2010

Currently flush tlb vector allocation is based on below equation:
sender = smp_processor_id() % 8
This isn't optimal, CPUs from different node can have the same vector, this
causes a lot of lock contention. Instead, we can assign the same vectors to
CPUs from the same node, while different node has different vectors. This has
below advantages:
a. if there is lock contention, the lock contention is between CPUs from one
node. This should be much cheaper than the contention between nodes.
b. completely avoid lock contention between nodes. This especially benefits
kswapd, which is the biggest user of tlb flush, since kswapd sets its affinity
to specific node.

In my test, this could reduce > 20% CPU overhead in extreme case.The test
machine has 4 nodes and each node has 16 CPUs. I then bind each node's kswapd
to the first CPU of the node. I run a workload with 4 sequential mmap file
read thread. The files are empty sparse file. This workload will trigger a
lot of page reclaim and tlbflush. The kswapd bind is to easy trigger the
extreme tlb flush lock contention because otherwise kswapd keeps migrating
between CPUs of a node and I can't get stable result. Sure in real workload,
we can't always see so big tlb flush lock contention, but it's possible.

[ hpa: folded in fix from Eric Dumazet to use this_cpu_read() ]
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
LKML-Reference: <1287544023.4571.8.camel@sli10-conroe.sh.intel.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

93296720

x86-32, mm: Add an initial page table for core bootstrapping · b40827fa

由 Borislav Petkov 提交于 8月 28, 2010

This patch adds an initial page table with low mappings used exclusively
for booting APs/resuming after ACPI suspend/machine restart. After this,
there's no need to add low mappings to swapper_pg_dir and zap them later
or create own swsusp PGD page solely for ACPI sleep needs - we have
initial_page_table for that.
Signed-off-by: NBorislav Petkov <bp@alien8.de>
LKML-Reference: <20101020070526.GA9588@liondog.tnic>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

b40827fa

x86, mm: Fix incorrect data type in vmalloc_sync_all() · f01f7c56

由 Borislav Petkov 提交于 10月 19, 2010

arch/x86/mm/fault.c: In function 'vmalloc_sync_all':
arch/x86/mm/fault.c:238: warning: assignment makes integer from pointer without a cast

introduced by 617d34d9.
Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20101020103642.GA3135@kryptos.osrc.amd.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

f01f7c56

20 10月, 2010 2 次提交

x86, mm: Hold mm->page_table_lock while doing vmalloc_sync · 617d34d9

由 Jeremy Fitzhardinge 提交于 9月 21, 2010

Take mm->page_table_lock while syncing the vmalloc region.  This prevents
a race with the Xen pagetable pin/unpin code, which expects that the
page_table_lock is already held.  If this race occurs, then Xen can see
an inconsistent page type (a page can either be read/write or a pagetable
page, and pin/unpin converts it between them), which will cause either
the pin or the set_p[gm]d to fail; either will crash the kernel.

vmalloc_sync_all() should be called rarely, so this extra use of
page_table_lock should not interfere with its normal users.

The mm pointer is stashed in the pgd page's index field, as that won't
be otherwise used for pgds.
Reported-by: NIan Campbell <ian.cambell@eu.citrix.com>
Originally-by: NJan Beulich <jbeulich@novell.com>
LKML-Reference: <4CB88A4C.1080305@goop.org>
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

617d34d9

x86, mm: Fix bogus whitespace in sync_global_pgds() · 44235dcd

由 Jeremy Fitzhardinge 提交于 10月 14, 2010

Whitespace cleanup only.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

44235dcd

15 10月, 2010 1 次提交

x86: Barf when vmalloc and kmemcheck faults happen in NMI · ebc8827f

由 Frederic Weisbecker 提交于 9月 27, 2010

In x86, faults exit by executing the iret instruction, which then
reenables NMIs if we faulted in NMI context. Then if a fault
happens in NMI, another NMI can nest after the fault exits.

But we don't yet support nested NMIs because we have only one NMI
stack. To prevent from that, check that vmalloc and kmemcheck
faults don't happen in this context. Most of the other kernel faults
in NMIs can be more easily spotted by finding explicit
copy_from,to_user() calls on review.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>

ebc8827f

14 10月, 2010 1 次提交

xen: Cope with unmapped pages when initializing kernel pagetable · fef5ba79

由 Jeremy Fitzhardinge 提交于 10月 13, 2010

Xen requires that all pages containing pagetable entries to be mapped
read-only.  If pages used for the initial pagetable are already mapped
then we can change the mapping to RO.  However, if they are initially
unmapped, we need to make sure that when they are later mapped, they
are also mapped RO.

We do this by knowing that the kernel pagetable memory is pre-allocated
in the range e820_table_start - e820_table_end, so any pfn within this
range should be mapped read-only.  However, the pagetable setup code
early_ioremaps the pages to write their entries, so we must make sure
that mappings created in the early_ioremap fixmap area are mapped RW.
(Those mappings are removed before the pages are presented to Xen
as pagetable pages.)
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
LKML-Reference: <4CB63A80.8060702@goop.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

fef5ba79

12 10月, 2010 1 次提交

x86, numa: For each node, register the memory blocks actually used · 73cf624d

由 Yinghai Lu 提交于 10月 10, 2010

Russ reported SGI UV is broken recently. He said:

| The SRAT table shows that memory range is spread over two nodes.
|
| SRAT: Node 0 PXM 0 100000000-800000000
| SRAT: Node 1 PXM 1 800000000-1000000000
| SRAT: Node 0 PXM 0 1000000000-1080000000
|
|Previously, the kernel early_node_map[] would show three entries
|with the proper node.
|
|[    0.000000]     0: 0x00100000 -> 0x00800000
|[    0.000000]     1: 0x00800000 -> 0x01000000
|[    0.000000]     0: 0x01000000 -> 0x01080000
|
|The problem is recent community kernel early_node_map[] shows
|only two entries with the node 0 entry overlapping the node 1
|entry.
|
|    0: 0x00100000 -> 0x01080000
|    1: 0x00800000 -> 0x01000000

After looking at the changelog, Found out that it has been broken for a while by
following commit

|commit 8716273c
|Author: David Rientjes <rientjes@google.com>
|Date:   Fri Sep 25 15:20:04 2009 -0700
|
|    x86: Export srat physical topology

Before that commit, register_active_regions() is called for every SRAT memory
entry right away.

Use nodememblk_range[] instead of nodes[] in order to make sure we
capture the actual memory blocks registered with each node.  nodes[]
contains an extended range which spans all memory regions associated
with a node, but that does not mean that all the memory in between are
included.
Reported-by: NRuss Anderson <rja@sgi.com>
Tested-by: NRuss Anderson <rja@sgi.com>
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
LKML-Reference: <4CB27BDF.5000800@kernel.org>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: <stable@kernel.org> 2.6.33 .34 .35 .36
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

73cf624d

08 10月, 2010 1 次提交

x86: HWPOISON: Report correct address granuality for huge hwpoison faults · f672b49b

由 Andi Kleen 提交于 9月 27, 2010

An earlier patch fixed the hwpoison fault handling to encode the
huge page size in the fault code of the page fault handler.

This is needed to report this information in SIGBUS to user space.

This is a straight forward patch to pass this information
through to the signal handling in the x86 specific fault.c

Cc: x86@kernel.org
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: fengguang.wu@intel.com
Signed-off-by: NAndi Kleen <ak@linux.intel.com>

f672b49b

06 10月, 2010 2 次提交

x86, memblock: Remove __memblock_x86_find_in_range_size() · 16c36f74

由 Yinghai Lu 提交于 10月 04, 2010

Fold it into memblock_x86_find_in_range(), and change bad_addr_size()
to check_reserve_memblock().

So whole memblock_x86_find_in_range_size() code is more readable.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
LKML-Reference: <4CAA4DEC.4000401@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

16c36f74

x86-32, memblock: Make add_highpages honor early reserved ranges · 1d931264

由 Yinghai Lu 提交于 10月 05, 2010

Originally the only early reserved range that is overlapped with high
pages is "KVA RAM", but we already do remove that from the active ranges.

However, It turns out Xen could have that kind of overlapping to support memory
ballooning.x

So we need to make add_highpage_with_active_regions() to subtract
memblock reserved just like low ram; this is the proper design anyway.

In this patch, refactering get_freel_all_memory_range() to make it can
be used by add_highpage_with_active_regions().  Also we don't need to
remove "KVA RAM" from active ranges.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
LKML-Reference: <4CABB183.1040607@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

1d931264

23 9月, 2010 1 次提交

x86, cleanups: Use clear_page/copy_page rather than memset/memcpy · 234bb549

由 Jan Beulich 提交于 9月 02, 2010

When operating on whole pages, use clear_page() and copy_page() in
favor of memset() and memcpy(); after all that's what they are
intended for.
Signed-off-by: NJan Beulich <jbeulich@novell.com>
LKML-Reference: <4C7FB8CA0200007800013F51@vpn.id2.novell.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

234bb549

21 9月, 2010 1 次提交

x86, k8: Rename k8.[ch] to amd_nb.[ch] and CONFIG_K8_NB to CONFIG_AMD_NB · 23ac4ae8

由 Andreas Herrmann 提交于 9月 17, 2010

The file names are somehow misleading as the code is not specific to
AMD K8 CPUs anymore. The files accomodate code for other AMD CPU
northbridges as well.

Same is true for the config option which is valid for AMD CPU
northbridges in general and not specific to K8.
Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100917160343.GD4958@loge.amd.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

23ac4ae8

05 9月, 2010 1 次提交

x86: Fix the address space annotations of iomap_atomic_prot_pfn() · cc1a8e52

由 Francisco Jerez 提交于 9月 04, 2010

This patch fixes the sparse warnings when the return pointer of
iomap_atomic_prot_pfn() is used as an argument of iowrite32()
and friends.
Signed-off-by: NFrancisco Jerez <currojerez@riseup.net>
LKML-Reference: <1283633804-11749-1-git-send-email-currojerez@riseup.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cc1a8e52

03 9月, 2010 1 次提交

x86, mm: fix uninitialized addr in kernel_physical_mapping_init() · 1c5f50ee

由 Wu Fengguang 提交于 9月 03, 2010

This re-adds the lost chunk in commit 9b861528.
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Haicheng Li <haicheng.li@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
LKML-Reference: <20100903090407.GA19771@localhost>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1c5f50ee

30 8月, 2010 1 次提交

x86, kmemcheck: Remove double test · 9fbaf49c

由 Julia Lawall 提交于 8月 28, 2010

The opcodes 0x2e and 0x3e are tested for in the first Group 2
line as well.

The sematic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@expression@
expression E;
@@

(
* E
  || ... || E
|
* E
  && ... && E
)
// </smpl>
Signed-off-by: NJulia Lawall <julia@diku.dk>
Reviewed-by: NPekka Enberg <penberg@cs.helsinki.fi>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
LKML-Reference: <1283010066-20935-5-git-send-email-julia@diku.dk>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

9fbaf49c

28 8月, 2010 14 次提交

x86: Remove old bootmem code · 774ea0bc

由 Yinghai Lu 提交于 8月 25, 2010

Requested by Ingo, Thomas and HPA.

The old bootmem code is no longer necessary, and the transition is
complete.  Remove it.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

774ea0bc

x86, memblock: Use memblock_memory_size()/memblock_free_memory_size() to get correct dma_reserve · 6f2a7536

由 Yinghai Lu 提交于 8月 25, 2010

memblock_memory_size() will return memory size in memblock.memory.region.
memblock_free_memory_size() will return free memory size in memblock.memory.region.

So We can get exact reseved size in specified range.

Set the size right after initmem_init(), because later bootmem API will
get area above 16M. (except some fallback).

Later after we remove the bootmem, We could call that just before paging_init().
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

6f2a7536

x86, memblock: Replace e820_/_early string with memblock_ · a9ce6bc1

由 Yinghai Lu 提交于 8月 25, 2010

1.include linux/memblock.h directly. so later could reduce e820.h reference.
2 this patch is done by sed scripts mainly

-v2: use MEMBLOCK_ERROR instead of -1ULL or -1UL
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

a9ce6bc1

x86: Use memblock to replace early_res · 72d7c3b3

由 Yinghai Lu 提交于 8月 25, 2010

1. replace find_e820_area with memblock_find_in_range
2. replace reserve_early with memblock_x86_reserve_range
3. replace free_early with memblock_x86_free_range.
4. NO_BOOTMEM will switch to use memblock too.
5. use _e820, _early wrap in the patch, in following patch, will
   replace them all
6. because memblock_x86_free_range support partial free, we can remove some special care
7. Need to make sure that memblock_find_in_range() is called after memblock_x86_fill()
   so adjust some calling later in setup.c::setup_arch()
   -- corruption_check and mptable_update

-v2: Move reserve_brk() early
    Before fill_memblock_area, to avoid overlap between brk and memblock_find_in_range()
    that could happen We have more then 128 RAM entry in E820 tables, and
    memblock_x86_fill() could use memblock_find_in_range() to find a new place for
    memblock.memory.region array.
    and We don't need to use extend_brk() after fill_memblock_area()
    So move reserve_brk() early before fill_memblock_area().
-v3: Move find_smp_config early
    To make sure memblock_find_in_range not find wrong place, if BIOS doesn't put mptable
    in right place.
-v4: Treat RESERVED_KERN as RAM in memblock.memory. and they are already in
    memblock.reserved already..
    use __NOT_KEEP_MEMBLOCK to make sure memblock related code could be freed later.
-v5: Generic version __memblock_find_in_range() is going from high to low, and for 32bit
    active_region for 32bit does include high pages
    need to replace the limit with memblock.default_alloc_limit, aka get_max_mapped()
-v6: Use current_limit instead
-v7: check with MEMBLOCK_ERROR instead of -1ULL or -1L
-v8: Set memblock_can_resize early to handle EFI with more RAM entries
-v9: update after kmemleak changes in mainline
Suggested-by: NDavid S. Miller <davem@davemloft.net>
Suggested-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Suggested-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

72d7c3b3

x86, memblock: Use memblock_debug to control debug message print out · 301ff3e8

由 Yinghai Lu 提交于 8月 25, 2010

Also let memblock_x86_reserve_range/memblock_x86_free_range could print out name if memblock=debug is
specified

will also print ther name when reserve_memblock_area/free_memblock_area are called.

-v2: according to Ingo, put " if (memblock_debug) " in one place
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

301ff3e8

x86, memblock: Add memblock_x86_memory_in_range() · e82d42be

由 Yinghai Lu 提交于 8月 25, 2010

It will return memory size in specified range according to memblock.memory.region

Try to share some code with memblock_x86_free_memory_in_range() by passing get_free to
__memblock_x86_memory_in_range().

-v2: Ben want _in_range in the name instead of size
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

e82d42be

x86, memblock: Add memblock_x86_free_memory_in_range() · b52c17ce

由 Yinghai Lu 提交于 8月 25, 2010

It will return free memory size in specified range.

We can not use memory_size - reserved_size here, because some reserved area
may not be in the scope of memblock.memory.region.

Use memblock.memory.region subtracting memblock.reserved.region to get free range array.
then count size of all free ranges.

-v2: Ben insist on using _in_range
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

b52c17ce

x86, memblock: Add memblock_x86_find_in_range_node() · 6bcc8176

由 Yinghai Lu 提交于 8月 25, 2010

It can be used to find NODE_DATA for numa.

Need to make sure early_node_map[] is filled before it is called, otherwise
it will fallback to memblock_find_in_range(), with node range.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

6bcc8176

x86, memblock: Add memblock_x86_register_active_regions() and memblock_x86_hole_size() · 88ba088c

由 Yinghai Lu 提交于 8月 25, 2010

memblock_x86_register_active_regions() will be used to fill early_node_map,
the result will be memblock.memory.region AND numa data

memblock_x86_hole_size will be used to find hole size on memblock.memory.region
with specified range.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

88ba088c

x86, memblock: Add get_free_all_memory_range() · 4d5cf86c

由 Yinghai Lu 提交于 8月 25, 2010

get_free_all_memory_range is for CONFIG_NO_BOOTMEM=y, and will be called by
free_all_memory_core_early().

It will use early_node_map aka active ranges subtract memblock.reserved to
get all free range, and those ranges will convert to slab pages.

-v4: increase range size
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Cc: Jan Beulich <jbeulich@novell.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

4d5cf86c

x86, memblock: Add memblock_x86_reserve_range/memblock_x86_free_range · 9dc5d569

由 Yinghai Lu 提交于 8月 25, 2010

They are wrappers for core versions, which take start/end/name instead
of base/size.  This will make x86 conversion eaasier.

could add more debug print out

-v2: change get_max_mapped() to memblock.default_alloc_limit according to Michael
      Ellerman and Ben
     change to memblock_x86_reserve_range and memblock_x86_free_range according to Michael Ellerman
-v3: call check_and_double after reserve/free, so could avoid to use
      find_memblock_area. Suggested by Michael Ellerman
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

9dc5d569

x86, memblock: Add memblock_x86_to_bootmem() · 27de7943

由 Yinghai Lu 提交于 8月 25, 2010

memblock_x86_to_bootmem() will reserve memblock.reserved.region in
bootmem after bootmem is set up.

We can use it to with all arches that support memblock later.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

27de7943

bootmem, x86: Add weak version of reserve_bootmem_generic · f88eff74

由 Yinghai Lu 提交于 8月 25, 2010

It will be used memblock_x86_to_bootmem converting

It is an wrapper for reserve_bootmem, and x86 64bit is using special one.

Also clean up that version for x86_64. We don't need to take care of numa
path for that, bootmem can handle it how
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

f88eff74

x86, memblock: Add memblock_x86_find_in_range_size() · fb74fb6d

由 Yinghai Lu 提交于 8月 25, 2010

size is returned according free range.
Will be used to find free ranges for early_memtest and memory corruption check

Do not mess it up with lib/memblock.c yet.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

fb74fb6d

27 8月, 2010 1 次提交

x86, mm: Make spurious_fault check explicitly check the PRESENT bit · 660a293e

由 Shaohua Li 提交于 7月 27, 2010

pte_present() returns true even present bit isn't set but _PAGE_PROTNONE
(global bit) bit is set. While with CONFIG_DEBUG_PAGEALLOC, free pages have
global bit set but present bit clear. This patch makes we could catch
free pages access with CONFIG_DEBUG_PAGEALLOC enabled.

[ hpa: added a comment in the code as a warning to janitors ]
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
LKML-Reference: <1280217988.32400.75.camel@sli10-desk.sh.intel.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

660a293e

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功