提交 · 1b5576e69a5fe168c08a159685ac366316ac9bbc · openthos / linux

02 2月, 2010 4 次提交

x86: Remove BIOS data range from e820 · 1b5576e6

由 Yinghai Lu 提交于 1月 22, 2010

In preparation for moving to the generic page_is_ram(), make explicit
what we expect to be reserved and not reserved.
Tested-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
LKML-Reference: <20100122033004.335813103@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

1b5576e6

Move page_is_ram() declaration to mm.h · 53df8fdc

由 Wu Fengguang 提交于 1月 27, 2010

Move page_is_ram() declaration to mm.h, it makes no sense in <linux/ioport.h>.
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
LKML-Reference: <20100127030639.GD8132@localhost>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

53df8fdc

Generic page_is_ram: use __weak · e5273007

由 Andrew Morton 提交于 1月 26, 2010

Use __weak instead of __attribute__((weak)).

Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

e5273007

resources: introduce generic page_is_ram() · 61ef2489

由 Wu Fengguang 提交于 1月 22, 2010

It's based on walk_system_ram_range(), for archs that don't have
their own page_is_ram().

The static verions in MIPS and SCORE are also made global.

v4: prefer plain 1 instead of PAGE_IS_RAM (H. Peter Anvin)
v3: add comment (KAMEZAWA Hiroyuki)
    "AFAIK, this "System RAM" information has been used for kdump to
    grab valid memory area and seems good for the kernel itself."
v2: add PAGE_IS_RAM macro (Américo Wang)

Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Américo Wang <xiyou.wangcong@gmail.com>
Cc: linux-mips@linux-mips.org
Cc: Yinghai Lu <yinghai@kernel.org>
Acked-by: NRalf Baechle <ralf@linux-mips.org>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
LKML-Reference: <20100122081619.GA6431@localhost>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

61ef2489

01 12月, 2009 1 次提交

x86, mm: Correct the implementation of is_untracked_pat_range() · ccef0864

由 H. Peter Anvin 提交于 11月 30, 2009

The semantics the PAT code expect of is_untracked_pat_range() is "is
this range completely contained inside the untracked region."  This
means that checkin 8a271389 was
technically wrong, because the implementation needlessly confusing.

The sane interface is for it to take a semiclosed range like just
about everything else (as evidenced by the sheer number of "- 1"'s
removed by that patch) so change the actual implementation to match.
Reported-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jack Steiner <steiner@sgi.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
LKML-Reference: <20091119202341.GA4420@sgi.com>

ccef0864

26 11月, 2009 1 次提交

x86/pat: Trivial: don't create debugfs for memtype if pat is disabled · dd4377b0

由 Xiaotian Feng 提交于 11月 26, 2009

If pat is disabled (boot with nopat), there's no need to create
debugfs for it, it's empty all the time.
Signed-off-by: NXiaotian Feng <dfeng@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <1259236428-16329-1-git-send-email-dfeng@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

dd4377b0

25 11月, 2009 1 次提交

x86, mtrr: Fix sorting of mtrr after subtracting · 5bf65b9b

由 Yinghai Lu 提交于 11月 24, 2009

In some cases we can coalesce MTRR entries after cleanup; this may
allow us to have more entries.  As such, introduce clean_sort_range to
to sort and coaelsce the MTRR entries.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
LKML-Reference: <4B0BB9A3.5020908@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

5bf65b9b

24 11月, 2009 7 次提交

x86: Move find_smp_config() earlier and avoid bootmem usage · b24c2a92

由 Yinghai Lu 提交于 11月 24, 2009

Move the find_smp_config() call to before bootmem is initialized.
Use reserve_early() instead of reserve_bootmem() in it.

This simplifies the code, we only need to call find_smp_config()
once and can remove the now unneeded reserve parameter from
x86_init_mpparse::find_smp_config.

We thus also reduce x86's dependency on bootmem allocations.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
LKML-Reference: <4B0BB9F2.70907@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b24c2a92

x86, platform: Change is_untracked_pat_range() to bool; cleanup init · eb41c8be

由 H. Peter Anvin 提交于 11月 23, 2009

- Change is_untracked_pat_range() to return bool.
- Clean up the initialization of is_untracked_pat_range() -- by default,
  we simply point it at is_ISA_range() directly.
- Move is_untracked_pat_range to the end of struct x86_platform, since
  it is the newest field.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20091119202341.GA4420@sgi.com>

eb41c8be

x86: Change is_ISA_range() into an inline function · 65f116f5

由 H. Peter Anvin 提交于 11月 23, 2009

Change is_ISA_range() from a macro to an inline function.  This makes
it type safe, and also allows it to be assigned to a function pointer
if necessary.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
LKML-Reference: <20091119202341.GA4420@sgi.com>

65f116f5

x86, mm: is_untracked_pat_range() takes a normal semiclosed range · 8a271389

由 H. Peter Anvin 提交于 11月 23, 2009

is_untracked_pat_range() -- like its components, is_ISA_range() and
is_GRU_range(), takes a normal semiclosed interval (>=, <) whereas the
PAT code called it as if it took a closed range (>=, <=).  Fix.

Although this is a bug, I believe it is non-manifest, simply because
none of the callers will call this with non-page-aligned addresses.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20091119202341.GA4420@sgi.com>

8a271389

x86, mm: Call is_untracked_pat_range() rather than is_ISA_range() · 55a6ca25

由 H. Peter Anvin 提交于 11月 23, 2009

Checkin fd12a0d6 made the PAT
untracked range a platform configurable, but missed on occurrence of
is_ISA_range() which still refers to PAT-untracked memory, and
therefore should be using the configurable.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20091119202341.GA4420@sgi.com>

55a6ca25

x86: UV SGI: Don't track GRU space in PAT · fd12a0d6

由 Jack Steiner 提交于 11月 19, 2009

GRU space is always mapped as WB in the page table. There is
no need to track the mappings in the PAT. This also eliminates
the "freeing invalid memtype" messages when the GRU space is
unmapped.
Signed-off-by: NJack Steiner <steiner@sgi.com>
LKML-Reference: <20091119202341.GA4420@sgi.com>
[ v2: fix build failure ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

fd12a0d6

x86: SGI UV: Fix BAU initialization · e38e2af1

由 Cliff Wickman 提交于 11月 19, 2009

A memory mapped register that affects the SGI UV Broadcast
Assist Unit's interrupt handling may sometimes be unintialized.

Remove the condition on its initialization, as that condition
can be randomly satisfied by a hardware reset.
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Cc: <stable@kernel.org>
LKML-Reference: <E1NBGB9-0005nU-Dp@eag09.americas.sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e38e2af1

23 11月, 2009 3 次提交

x86, numa: Use near(er) online node instead of roundrobin for NUMA · d9c2d5ac

由 Yinghai Lu 提交于 11月 21, 2009

CPU to node mapping is set via the following sequence:

 1. numa_init_array(): Set up roundrobin from cpu to online node

 2. init_cpu_to_node(): Set that according to apicid_to_node[]
			according to srat only handle the node that
			is online, and leave other cpu on node
			without ram (aka not online) to still
			roundrobin.

3. later call srat_detect_node for Intel/AMD, will use first_online
   node or nearby node.

Problem is that setup_per_cpu_areas() is not called between 2 and 3,
the per_cpu for cpu on node with ram is on different node, and could
put that on node with two hops away.

So try to optimize this and add find_near_online_node() and call
init_cpu_to_node().
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <4B07A739.3030104@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d9c2d5ac

x86, numa, bootmem: Only free bootmem on NUMA failure path · 021428ad

由 Yinghai Lu 提交于 11月 21, 2009

In the NUMA bootmem setup failure path we freed nodedata_phys
incorrectly.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <4B07A739.3030104@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

021428ad

x86: Change crash kernel to reserve via reserve_early() · 44280733

由 Yinghai Lu 提交于 11月 22, 2009

use find_e820_area()/reserve_early() instead.

-v2: address Eric's request, to restore original semantics.
     will fail, if the provided address can not be used.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Acked-by: NEric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <4B09E2F9.7040403@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

44280733

19 11月, 2009 1 次提交

x86: Eliminate redundant/contradicting cache line size config options · 350f8f56

由 Jan Beulich 提交于 11月 13, 2009

Rather than having X86_L1_CACHE_BYTES and X86_L1_CACHE_SHIFT
(with inconsistent defaults), just having the latter suffices as
the former can be easily calculated from it.

To be consistent, also change X86_INTERNODE_CACHE_BYTES to
X86_INTERNODE_CACHE_SHIFT, and set it to 7 (128 bytes) for NUMA
to account for last level cache line size (which here matters
more than L1 cache line size).

Finally, make sure the default value for X86_L1_CACHE_SHIFT,
when X86_GENERIC is selected, is being seen before that for the
individual CPU model options (other than on x86-64, where
GENERIC_CPU is part of the choice construct, X86_GENERIC is a
separate option on ix86).
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Acked-by: NRavikiran Thirumalai <kiran@scalex86.org>
Acked-by: NNick Piggin <npiggin@suse.de>
LKML-Reference: <4AFD5710020000780001F8F0@vpn.id2.novell.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

350f8f56

18 11月, 2009 1 次提交

x86: When cleaning MTRRs, do not fold WP into UC · 508d85c2

由 Yinghai Lu 提交于 11月 16, 2009

The current MTRR code treats WP as a form of UC.  This really isn't
desirable behaviour, except possibly in the case of severe MTRR
shortage.  Disable this, to allow legitimate uses of WP to remain
unmolested.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>

508d85c2

17 11月, 2009 6 次提交

x86: remove "extern" from function prototypes in <asm/proto.h> · 5bd085b5

由 H. Peter Anvin 提交于 11月 16, 2009

Function prototypes don't need "extern", and it is generally frowned
upon to have them.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

5bd085b5

x86, mm: Report state of NX protections during boot · 4b0f3b81

由 Kees Cook 提交于 11月 13, 2009

It is possible for x86_64 systems to lack the NX bit either due to the
hardware lacking support or the BIOS having turned off the CPU capability,
so NX status should be reported. Additionally, anyone booting NX-capable
CPUs in 32bit mode without PAE will lack NX functionality, so this change
provides feedback for that case as well.
Signed-off-by: NKees Cook <kees.cook@canonical.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
LKML-Reference: <1258154897-6770-6-git-send-email-hpa@zytor.com>

4b0f3b81

x86, mm: Clean up and simplify NX enablement · 4763ed4d

由 H. Peter Anvin 提交于 11月 13, 2009

The 32- and 64-bit code used very different mechanisms for enabling
NX, but even the 32-bit code was enabling NX in head_32.S if it is
available.  Furthermore, we had a bewildering collection of tests for
the available of NX.

This patch:

a) merges the 32-bit set_nx() and the 64-bit check_efer() function
   into a single x86_configure_nx() function.  EFER control is left
   to the head code.

b) eliminates the nx_enabled variable entirely.  Things that need to
   test for NX enablement can verify __supported_pte_mask directly,
   and cpu_has_nx gives the supported status of NX.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Chris Wright <chrisw@sous-sol.org>
LKML-Reference: <1258154897-6770-5-git-send-email-hpa@zytor.com>
Acked-by: NKees Cook <kees.cook@canonical.com>

4763ed4d

x86, pageattr: Make set_memory_(x|nx) aware of NX support · 583140af

由 H. Peter Anvin 提交于 11月 13, 2009

Make set_memory_x/set_memory_nx directly aware of if NX is supported
in the system or not, rather than requiring that every caller assesses
that support independently.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Tim Starling <tstarling@wikimedia.org>
Cc: Hannes Eder <hannes@hanneseder.net>
LKML-Reference: <1258154897-6770-4-git-send-email-hpa@zytor.com>
Acked-by: NKees Cook <kees.cook@canonical.com>

583140af

x86, sleep: Always save the value of EFER · a7c4c0d9

由 H. Peter Anvin 提交于 11月 13, 2009

Always save the value of EFER, regardless of the state of NX.  Since
EFER may not actually exist, use rdmsr_safe() to do so.

v2: check the return value from rdmsr_safe() instead of relying on
    the output values being unchanged on error.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@tuxonice.net>
LKML-Reference: <1258154897-6770-3-git-send-email-hpa@zytor.com>
Acked-by: NKees Cook <kees.cook@canonical.com>

a7c4c0d9

x86-32: Use symbolic constants, safer CPUID when enabling EFER.NX · 8a50e513

由 H. Peter Anvin 提交于 11月 13, 2009

Use symbolic constants rather than hard-coded values when setting
EFER.NX in head_32.S, and do a more rigorous test for the validity of
the response when probing for the extended CPUID range.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
LKML-Reference: <1258154897-6770-2-git-send-email-hpa@zytor.com>
Acked-by: NKees Cook <kees.cook@canonical.com>

8a50e513

12 11月, 2009 1 次提交

x86: Make sure wakeup trampoline code is below 1MB · 196cf0d6

由 Yinghai Lu 提交于 11月 10, 2009

Instead of using bootmem, try find_e820_area()/reserve_early(),
and call acpi_reserve_memory() early, to allocate the wakeup
trampoline code area below 1M.

This is more reliable, and it also removes a dependency on
bootmem.

-v2: change function name to acpi_reserve_wakeup_memory(),
     as suggested by Rafael.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
Cc: pm list <linux-pm@lists.linux-foundation.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <4AFA210B.3020207@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

196cf0d6

08 11月, 2009 1 次提交

x86: k8.h: Add struct bootnode · 0420101c

由 Randy Dunlap 提交于 10月 28, 2009

k8.h uses struct bootnode but does not #include a header file
for it, so provide a simple declaration for it.

  arch/x86/include/asm/k8.h:13: warning: 'struct bootnode'
  declared inside parameter list arch/x86/include/asm/k8.h:13:
  warning: its scope is only this definition or declaration, which is probably not what you want
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
LKML-Reference: <20091028160955.d27ccb16.randy.dunlap@oracle.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0420101c

03 11月, 2009 3 次提交

x86_64, cpa: Use only text section in set_kernel_text_rw/ro · e7d23dde

由 Suresh Siddha 提交于 10月 28, 2009

set_kernel_text_rw()/set_kernel_text_ro() are marking pages
starting from _text to __start_rodata as RW or RO.

With CONFIG_DEBUG_RODATA, there might be free pages (associated
with padding the sections to 2MB large page boundary) between
text and rodata sections that are given back to page allocator.
So we should use only use the start (__text) and end
(__stop___ex_table) of the text section in
set_kernel_text_rw()/set_kernel_text_ro().
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Tested-by: NSteven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20091029024821.164525222@sbs-t61.sc.intel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e7d23dde

x86_64, ftrace: Make ftrace use kernel identity mapping to modify code · 55ca3cc1

由 Suresh Siddha 提交于 10月 28, 2009

On x86_64, kernel text mappings are mapped read-only with
CONFIG_DEBUG_RODATA. So use the kernel identity mapping instead
of the kernel text mapping to modify the kernel text.
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Tested-by: NSteven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20091029024821.080941108@sbs-t61.sc.intel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

55ca3cc1

x86, cpa: Fix kernel text RO checks in static_protection() · 502f6604

由 Suresh Siddha 提交于 10月 28, 2009

Steven Rostedt reported that we are unconditionally making the
kernel text mapping as read-only. i.e., if someone does cpa() to
the kernel text area for setting/clearing any page table
attribute, we unconditionally clear the read-write attribute for
the kernel text mapping that is set at compile time.

We should delay (to forbid the write attribute) and enforce only
after the kernel has mapped the text as read-only.
Reported-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Tested-by: NSteven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20091029024820.996634347@sbs-t61.sc.intel.com>
[ marked kernel_set_to_readonly as __read_mostly ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

502f6604

28 10月, 2009 1 次提交

tracing: allow to change permissions for text with dynamic ftrace enabled · 883242dd

由 Steven Rostedt 提交于 10月 27, 2009

The commit 74e08179
x86-64: align RODATA kernel section to 2MB with CONFIG_DEBUG_RODATA
prevents text sections from becoming read/write using set_memory_rw.

The dynamic ftrace changes all text pages to read/write just before
converting the calls to tracing to nops, and vice versa.

I orginally just added a flag to allow this transaction when ftrace
did the change, but I also found that when the CPA testing was running
it would remove the read/write as well, and ftrace does not do the text
conversion on boot up, and the CPA changes caused the dynamic tracer
to fail on self tests.

The current solution I have is to simply not to prevent
change_page_attr from setting the RW bit for kernel text pages.
Reported-by: NIngo Molnar <mingo@elte.hu>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

883242dd

24 10月, 2009 1 次提交

x86, boot: Simplify setting of the PAE bit · 4868402d

由 Alexander Potashev 提交于 10月 24, 2009

A single 'movl' is shorter than the 'xorl'-'orl' pair.
No change in behaviour.
Signed-off-by: NAlexander Potashev <aspotashev@gmail.com>
LKML-Reference: <1256341043-4928-1-git-send-email-aspotashev@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

4868402d

23 10月, 2009 1 次提交

x86: Remove pfn in add_one_highpage_init() · b1258ac2

由 Minchan Kim 提交于 10月 22, 2009

commit cc9f7a0c changed
add_one_highpage_init. We don't use pfn any more.
Let's remove unnecessary argument.

This patch doesn't chage function behavior.
This patch is based on v2.6.32-rc5.
Signed-off-by: NMinchan Kim <minchan.kim@gmail.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
LKML-Reference: <20091022112722.adc8e55c.minchan.kim@barrios-desktop>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b1258ac2

20 10月, 2009 3 次提交

x86-64: add comment for RODATA large page retainment · d6cc1c3a

由 Suresh Siddha 提交于 10月 19, 2009

Add a comment explaining why RODATA is aligned to 2 MB.
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

d6cc1c3a

x86-64: align RODATA kernel section to 2MB with CONFIG_DEBUG_RODATA · 74e08179

由 Suresh Siddha 提交于 10月 14, 2009

CONFIG_DEBUG_RODATA chops the large pages spanning boundaries of kernel
text/rodata/data to small 4KB pages as they are mapped with different
attributes (text as RO, RODATA as RO and NX etc).

On x86_64, preserve the large page mappings for kernel text/rodata/data
boundaries when CONFIG_DEBUG_RODATA is enabled. This is done by allowing the
RODATA section to be hugepage aligned and having same RWX attributes
for the 2MB page boundaries

Extra Memory pages padding the sections will be freed during the end of the boot
and the kernel identity mappings will have different RWX permissions compared to
the kernel text mappings.

Kernel identity mappings to these physical pages will be mapped with smaller
pages but large page mappings are still retained for kernel text,rodata,data
mappings.
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20091014220254.190119924@sbs-t61.sc.intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

74e08179

x86-64: preserve large page mapping for 1st 2MB kernel txt with CONFIG_DEBUG_RODATA · b9af7c0d

由 Suresh Siddha 提交于 10月 14, 2009

In the first 2MB, kernel text is co-located with kernel static
page tables setup by head_64.S.  CONFIG_DEBUG_RODATA chops this
2MB large page mapping to small 4KB pages as we mark the kernel text as RO,
leaving the static page tables as RW.

With CONFIG_DEBUG_RODATA disabled, OLTP run on NHM-EP shows 1% improvement
with 2% reduction in system time and 1% improvement in iowait idle time.

To recover this, move the kernel static page tables to .data section, so that
we don't have to break the first 2MB of kernel text to small pages with
CONFIG_DEBUG_RODATA.
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20091014220254.063193621@sbs-t61.sc.intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

b9af7c0d

13 10月, 2009 4 次提交

x86: Interleave emulated nodes over physical nodes · adc19389

由 David Rientjes 提交于 9月 25, 2009

Add interleaved NUMA emulation support

This patch interleaves emulated nodes over the system's physical
nodes. This is required for interleave optimizations since
mempolicies, for example, operate by iterating over a nodemask and
act without knowledge of node distances.  It can also be used for
testing memory latencies and NUMA bugs in the kernel.

There're a couple of ways to do this:

 - divide the number of emulated nodes by the number of physical
   nodes and allocate the result on each physical node, or

 - allocate each successive emulated node on a different physical
   node until all memory is exhausted.

The disadvantage of the first option is, depending on the asymmetry
in node capacities of each physical node, emulated nodes may
substantially differ in size on a particular physical node compared
to another.

The disadvantage of the second option is, also depending on the
asymmetry in node capacities of each physical node, there may be
more emulated nodes allocated on a single physical node as another.

This patch implements the second option; we sacrifice the
possibility that we may have slightly more emulated nodes on a
particular physical node compared to another in lieu of node size
asymmetry.

 [ Note that "node capacity" of a physical node is not only a
   function of its addressable range, but also is affected by
   subtracting out the amount of reserved memory over that range.
   NUMA emulation only deals with available, non-reserved memory
   quantities. ]

We ensure there is at least a minimal amount of available memory
allocated to each node.  We also make sure that at least this
amount of available memory is available in ZONE_DMA32 for any node
that includes both ZONE_DMA32 and ZONE_NORMAL.

This patch also cleans the emulation code up by no longer passing
the statically allocated struct bootnode array among the various
functions. This init.data array is not allocated on the stack since
it may be very large and thus it may be accessed at file scope.

The WARN_ON() for nodes_cover_memory() when faking proximity
domains is removed since it relies on successive nodes always
having greater start addresses than previous nodes; with
interleaving this is no longer always true.
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Ankita Garg <ankita@in.ibm.com>
Cc: Len Brown <len.brown@intel.com>
LKML-Reference: <alpine.DEB.1.00.0909251519150.14754@chino.kir.corp.google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

adc19389

x86: Export srat physical topology · 8716273c

由 David Rientjes 提交于 9月 25, 2009

This is the counterpart to "x86: export k8 physical topology" for
SRAT. It is not as invasive because the acpi code already seperates
node setup into detection and registration steps, with the
exception of registering e820 active regions in
acpi_numa_memory_affinity_init().  This is now moved to
acpi_scan_nodes() if NUMA emulation is disabled or deferred.

acpi_numa_init() now returns a value which specifies whether an
underlying SRAT was located.  If so, that topology can be used by
the emulation code to interleave emulated nodes over physical nodes
or to register the nodes for ACPI.

acpi_get_nodes() may now be used to export the srat physical
topology of the machine for NUMA emulation.
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Ankita Garg <ankita@in.ibm.com>
Cc: Len Brown <len.brown@intel.com>
LKML-Reference: <alpine.DEB.1.00.0909251518580.14754@chino.kir.corp.google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8716273c

x86: Export k8 physical topology · 8ee2debc

由 David Rientjes 提交于 9月 25, 2009

To eventually interleave emulated nodes over physical nodes, we
need to know the physical topology of the machine without actually
registering it.  This does the k8 node setup in two parts:
detection and registration.  NUMA emulation can then used the
physical topology detected to setup the address ranges of emulated
nodes accordingly.  If emulation isn't used, the k8 nodes are
registered as normal.

Two formals are added to the x86 NUMA setup functions: `acpi' and
`k8'. These represent whether ACPI or K8 NUMA has been detected;
both cannot be true at the same time.  This specifies to the NUMA
emulation code whether an underlying physical NUMA topology exists
and which interface to use.

This patch deals solely with separating the k8 setup path into
Northbridge detection and registration steps and leaves the ACPI
changes for a subsequent patch.  The `acpi' formal is added here,
however, to avoid touching all the header files again in the next
patch.

This approach also ensures emulated nodes will not span physical
nodes so the true memory latency is not misrepresented.

k8_get_nodes() may now be used to export the k8 physical topology
of the machine for NUMA emulation.
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Ankita Garg <ankita@in.ibm.com>
Cc: Len Brown <len.brown@intel.com>
LKML-Reference: <alpine.DEB.1.00.0909251518400.14754@chino.kir.corp.google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8ee2debc

x86: Clean up and add missing log levels for k8 · 1af5ba51

由 David Rientjes 提交于 9月 25, 2009

Convert all printk's in arch/x86/mm/k8topology_64.c to use
pr_info() or pr_err() appropriately.

Adds log levels for messages currently lacking them.
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Ankita Garg <ankita@in.ibm.com>
Cc: Len Brown <len.brown@intel.com>
LKML-Reference: <alpine.DEB.1.00.0909251517440.14754@chino.kir.corp.google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1af5ba51