提交 · ed391f4ebf8f701d3566423ce8f17e614cde9806 · openanolis / cloud-kernel

26 2月, 2010 1 次提交

x86, mm: Unify kernel_physical_mapping_init() API · c1fd1b43

由 Pekka Enberg 提交于 2月 24, 2010

This patch changes the 32-bit version of kernel_physical_mapping_init() to
return the last mapped address like the 64-bit one so that we can unify the
call-site in init_memory_mapping().

Cc: Yinghai Lu <yinghai@kernel.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <alpine.DEB.2.00.1002241703570.1180@melkki.cs.helsinki.fi>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

c1fd1b43

25 2月, 2010 1 次提交

x86, mm: Allow highmem user page tables to be disabled at boot time · 14315592

由 Ian Campbell 提交于 2月 17, 2010

Distros generally (I looked at Debian, RHEL5 and SLES11) seem to
enable CONFIG_HIGHPTE for any x86 configuration which has highmem
enabled. This means that the overhead applies even to machines which
have a fairly modest amount of high memory and which therefore do not
really benefit from allocating PTEs in high memory but still pay the
price of the additional mapping operations.

Running kernbench on a 4G box I found that with CONFIG_HIGHPTE=y but
no actual highptes being allocated there was a reduction in system
time used from 59.737s to 55.9s.

With CONFIG_HIGHPTE=y and highmem PTEs being allocated:
  Average Optimal load -j 4 Run (std deviation):
  Elapsed Time 175.396 (0.238914)
  User Time 515.983 (5.85019)
  System Time 59.737 (1.26727)
  Percent CPU 263.8 (71.6796)
  Context Switches 39989.7 (4672.64)
  Sleeps 42617.7 (246.307)

With CONFIG_HIGHPTE=y but with no highmem PTEs being allocated:
  Average Optimal load -j 4 Run (std deviation):
  Elapsed Time 174.278 (0.831968)
  User Time 515.659 (6.07012)
  System Time 55.9 (1.07799)
  Percent CPU 263.8 (71.266)
  Context Switches 39929.6 (4485.13)
  Sleeps 42583.7 (373.039)

This patch allows the user to control the allocation of PTEs in
highmem from the command line ("userpte=nohigh") but retains the
status-quo as the default.

It is possible that some simple heuristic could be developed which
allows auto-tuning of this option however I don't have a sufficiently
large machine available to me to perform any particularly meaningful
experiments. We could probably handwave up an argument for a threshold
at 16G of total RAM.

Assuming 768M of lowmem we have 196608 potential lowmem PTE
pages. Each page can map 2M of RAM in a PAE-enabled configuration,
meaning a maximum of 384G of RAM could potentially be mapped using
lowmem PTEs.

Even allowing generous factor of 10 to account for other required
lowmem allocations, generous slop to account for page sharing (which
reduces the total amount of RAM mappable by a given number of PT
pages) and other innacuracies in the estimations it would seem that
even a 32G machine would not have a particularly pressing need for
highmem PTEs. I think 32G could be considered to be at the upper bound
of what might be sensible on a 32 bit machine (although I think in
practice 64G is still supported).

It's seems questionable if HIGHPTE is even a win for any amount of RAM
you would sensibly run a 32 bit kernel on rather than going 64 bit.
Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
LKML-Reference: <1266403090-20162-1-git-send-email-ian.campbell@citrix.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

14315592

23 2月, 2010 1 次提交

x86_64, cpa: Don't work hard in preserving kernel 2M mappings when using 4K already · 281ff33b

由 Suresh Siddha 提交于 2月 18, 2010

We currently enforce the !RW mapping for the kernel mapping that maps
holes between different text, rodata and data sections. However, kernel
identity mappings will have different RWX permissions to the pages mapping to
text and to the pages padding (which are freed) the text, rodata sections.
Hence kernel identity mappings will be broken to smaller pages. For 64-bit,
kernel text and kernel identity mappings are different, so we can enable
protection checks that come with CONFIG_DEBUG_RODATA, as well as retain 2MB
large page mappings for kernel text.

Konrad reported a boot failure with the Linux Xen paravirt guest because of
this. In this paravirt guest case, the kernel text mapping and the kernel
identity mapping share the same page-table pages. Thus forcing the !RW mapping
for some of the kernel mappings also cause the kernel identity mappings to be
read-only resulting in the boot failure. Linux Xen paravirt guest also
uses 4k mappings and don't use 2M mapping.

Fix this issue and retain large page performance advantage for native kernels
by not working hard and not enforcing !RW for the kernel text mapping,
if the current mapping is already using small page mapping.
Reported-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1266522700.2909.34.camel@sbs-t61.sc.intel.com>
Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: stable@kernel.org [2.6.32, 2.6.33]
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

281ff33b

18 2月, 2010 2 次提交

kmemcheck: Test the full object in kmemcheck_is_obj_initialized() · 81fc0390

由 Catalin Marinas 提交于 2月 08, 2010

This is a fix for bug #14845 (bugzilla.kernel.org). The update_checksum()
function in mm/kmemleak.c calls kmemcheck_is_obj_initialised() before scanning
an object. When KMEMCHECK_PARTIAL_OK is enabled, this function returns true.
However, the crc32_le() reads smaller intervals (32-bit) for which
kmemleak_is_obj_initialised() may be false leading to a kmemcheck warning.

Note that kmemcheck_is_obj_initialized() is currently only used by
kmemleak before scanning a memory location.
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christian Casteyde <casteyde.christian@free.fr>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

81fc0390

T
x86: Convert tlbstate_lock to raw_spinlock · 39c662f6
由 Thomas Gleixner 提交于 7月 25, 2009
```
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
```
39c662f6

16 2月, 2010 3 次提交

x86, numa: Remove configurable node size support for numa emulation · ca2107c9

由 David Rientjes 提交于 2月 15, 2010

Now that numa=fake=<size>[MG] is implemented, it is possible to remove
configurable node size support.  The command-line parsing was already
broken (numa=fake=*128, for example, would not work) and since fake nodes
are now interleaved over physical nodes, this support is no longer
required.
Signed-off-by: NDavid Rientjes <rientjes@google.com>
LKML-Reference: <alpine.DEB.2.00.1002151343080.26927@chino.kir.corp.google.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

ca2107c9

x86, numa: Add fixed node size option for numa emulation · 8df5bb34

由 David Rientjes 提交于 2月 15, 2010

numa=fake=N specifies the number of fake nodes, N, to partition the
system into and then allocates them by interleaving over physical nodes.
This requires knowledge of the system capacity when attempting to
allocate nodes of a certain size: either very large nodes to benchmark
scalability of code that operates on individual nodes, or very small
nodes to find bugs in the VM.

This patch introduces numa=fake=<size>[MG] so it is possible to specify
the size of each node to allocate.  When used, nodes of the size
specified will be allocated and interleaved over the set of physical
nodes.

FAKE_NODE_MIN_SIZE was also moved to the more-appropriate
include/asm/numa_64.h.
Signed-off-by: NDavid Rientjes <rientjes@google.com>
LKML-Reference: <alpine.DEB.2.00.1002151342510.26927@chino.kir.corp.google.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

8df5bb34

x86, numa: Fix numa emulation calculation of big nodes · 68fd111e

由 David Rientjes 提交于 2月 15, 2010

numa=fake=N uses split_nodes_interleave() to partition the system into N
fake nodes.  Each node size must have be a multiple of
FAKE_NODE_MIN_SIZE, otherwise it is possible to get strange alignments.
Because of this, the remaining memory from each node when rounded to
FAKE_NODE_MIN_SIZE is consolidated into a number of "big nodes" that are
bigger than the rest.

The calculation of the number of big nodes is incorrect since it is using
a logical AND operator when it should be multiplying the rounded-off
portion of each node with N.
Signed-off-by: NDavid Rientjes <rientjes@google.com>
LKML-Reference: <alpine.DEB.2.00.1002151342230.26927@chino.kir.corp.google.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

68fd111e

13 2月, 2010 3 次提交

x86: Make 32bit support NO_BOOTMEM · 59be5a8e

由 Yinghai Lu 提交于 2月 10, 2010

Let's make 32bit consistent with 64bit.

-v2: Andrew pointed out for 32bit that we should use -1ULL
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-25-git-send-email-yinghai@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

59be5a8e

sparsemem: Put mem map for one node together. · 9bdac914

由 Yinghai Lu 提交于 2月 10, 2010

Add vmemmap_alloc_block_buf for mem map only.

It will fallback to the old way if it cannot get a block that big.

Before this patch, when a node have 128g ram installed, memmap are
split into two parts or more.
[    0.000000]  [ffffea0000000000-ffffea003fffffff] PMD -> [ffff880100600000-ffff88013e9fffff] on node 1
[    0.000000]  [ffffea0040000000-ffffea006fffffff] PMD -> [ffff88013ec00000-ffff88016ebfffff] on node 1
[    0.000000]  [ffffea0070000000-ffffea007fffffff] PMD -> [ffff882000600000-ffff8820105fffff] on node 0
[    0.000000]  [ffffea0080000000-ffffea00bfffffff] PMD -> [ffff882010800000-ffff8820507fffff] on node 0
[    0.000000]  [ffffea00c0000000-ffffea00dfffffff] PMD -> [ffff882050a00000-ffff8820709fffff] on node 0
[    0.000000]  [ffffea00e0000000-ffffea00ffffffff] PMD -> [ffff884000600000-ffff8840205fffff] on node 2
[    0.000000]  [ffffea0100000000-ffffea013fffffff] PMD -> [ffff884020800000-ffff8840607fffff] on node 2
[    0.000000]  [ffffea0140000000-ffffea014fffffff] PMD -> [ffff884060a00000-ffff8840709fffff] on node 2
[    0.000000]  [ffffea0150000000-ffffea017fffffff] PMD -> [ffff886000600000-ffff8860305fffff] on node 3
[    0.000000]  [ffffea0180000000-ffffea01bfffffff] PMD -> [ffff886030800000-ffff8860707fffff] on node 3
[    0.000000]  [ffffea01c0000000-ffffea01ffffffff] PMD -> [ffff888000600000-ffff8880405fffff] on node 4
[    0.000000]  [ffffea0200000000-ffffea022fffffff] PMD -> [ffff888040800000-ffff8880707fffff] on node 4
[    0.000000]  [ffffea0230000000-ffffea023fffffff] PMD -> [ffff88a000600000-ffff88a0105fffff] on node 5
[    0.000000]  [ffffea0240000000-ffffea027fffffff] PMD -> [ffff88a010800000-ffff88a0507fffff] on node 5
[    0.000000]  [ffffea0280000000-ffffea029fffffff] PMD -> [ffff88a050a00000-ffff88a0709fffff] on node 5
[    0.000000]  [ffffea02a0000000-ffffea02bfffffff] PMD -> [ffff88c000600000-ffff88c0205fffff] on node 6
[    0.000000]  [ffffea02c0000000-ffffea02ffffffff] PMD -> [ffff88c020800000-ffff88c0607fffff] on node 6
[    0.000000]  [ffffea0300000000-ffffea030fffffff] PMD -> [ffff88c060a00000-ffff88c0709fffff] on node 6
[    0.000000]  [ffffea0310000000-ffffea033fffffff] PMD -> [ffff88e000600000-ffff88e0305fffff] on node 7
[    0.000000]  [ffffea0340000000-ffffea037fffffff] PMD -> [ffff88e030800000-ffff88e0707fffff] on node 7

after patch will get
[    0.000000]  [ffffea0000000000-ffffea006fffffff] PMD -> [ffff880100200000-ffff88016e5fffff] on node 0
[    0.000000]  [ffffea0070000000-ffffea00dfffffff] PMD -> [ffff882000200000-ffff8820701fffff] on node 1
[    0.000000]  [ffffea00e0000000-ffffea014fffffff] PMD -> [ffff884000200000-ffff8840701fffff] on node 2
[    0.000000]  [ffffea0150000000-ffffea01bfffffff] PMD -> [ffff886000200000-ffff8860701fffff] on node 3
[    0.000000]  [ffffea01c0000000-ffffea022fffffff] PMD -> [ffff888000200000-ffff8880701fffff] on node 4
[    0.000000]  [ffffea0230000000-ffffea029fffffff] PMD -> [ffff88a000200000-ffff88a0701fffff] on node 5
[    0.000000]  [ffffea02a0000000-ffffea030fffffff] PMD -> [ffff88c000200000-ffff88c0701fffff] on node 6
[    0.000000]  [ffffea0310000000-ffffea037fffffff] PMD -> [ffff88e000200000-ffff88e0701fffff] on node 7

-v2: change buf to vmemmap_buf instead according to Ingo
     also add CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER according to Ingo
-v3: according to Andrew, use sizeof(name) instead of hard coded 15
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-19-git-send-email-yinghai@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Acked-by: NChristoph Lameter <cl@linux-foundation.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

9bdac914

x86: Make 64 bit use early_res instead of bootmem before slab · 08677214

由 Yinghai Lu 提交于 2月 10, 2010

Finally we can use early_res to replace bootmem for x86_64 now.

Still can use CONFIG_NO_BOOTMEM to enable it or not.

-v2: fix 32bit compiling about MAX_DMA32_PFN
-v3: folded bug fix from LKML message below
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
LKML-Reference: <4B747239.4070907@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

08677214

11 2月, 2010 2 次提交

x86: Make early_node_mem get mem > 4 GB if possible · cef625ee

由 Yinghai Lu 提交于 2月 10, 2010

So we could put pgdata for the node high, and later sparse
vmmap will get the section nr that need.

With this patch will make <4 GB ram not use a sparse vmmap.

before this patch, will get, before swiotlb try get bootmem
[    0.000000] nid=1 start=0 end=2080000 aligned=1
[    0.000000]   free [10 - 96]
[    0.000000]   free [b12 - 1000]
[    0.000000]   free [359f - 38a3]
[    0.000000]   free [38b5 - 3a00]
[    0.000000]   free [41e01 - 42000]
[    0.000000]   free [73dde - 73e00]
[    0.000000]   free [73fdd - 74000]
[    0.000000]   free [741dd - 74200]
[    0.000000]   free [743dd - 74400]
[    0.000000]   free [745dd - 74600]
[    0.000000]   free [747dd - 74800]
[    0.000000]   free [749dd - 74a00]
[    0.000000]   free [74bdd - 74c00]
[    0.000000]   free [74ddd - 74e00]
[    0.000000]   free [74fdd - 75000]
[    0.000000]   free [751dd - 75200]
[    0.000000]   free [753dd - 75400]
[    0.000000]   free [755dd - 75600]
[    0.000000]   free [757dd - 75800]
[    0.000000]   free [759dd - 75a00]
[    0.000000]   free [75bdd - 7bf5f]
[    0.000000]   free [7f730 - 7f750]
[    0.000000]   free [100000 - 2080000]
[    0.000000]   total free 1f87170
[   93.301474] Placing 64MB software IO TLB between ffff880075bdd000 - ffff880079bdd000
[   93.311814] software IO TLB at phys 0x75bdd000 - 0x79bdd000

with this patch will get: before swiotlb try get bootmem
[    0.000000] nid=1 start=0 end=2080000 aligned=1
[    0.000000]   free [a - 96]
[    0.000000]   free [702 - 1000]
[    0.000000]   free [359f - 3600]
[    0.000000]   free [37de - 3800]
[    0.000000]   free [39dd - 3a00]
[    0.000000]   free [3bdd - 3c00]
[    0.000000]   free [3ddd - 3e00]
[    0.000000]   free [3fdd - 4000]
[    0.000000]   free [41dd - 4200]
[    0.000000]   free [43dd - 4400]
[    0.000000]   free [45dd - 4600]
[    0.000000]   free [47dd - 4800]
[    0.000000]   free [49dd - 4a00]
[    0.000000]   free [4bdd - 4c00]
[    0.000000]   free [4ddd - 4e00]
[    0.000000]   free [4fdd - 5000]
[    0.000000]   free [51dd - 5200]
[    0.000000]   free [53dd - 5400]
[    0.000000]   free [55dd - 7bf5f]
[    0.000000]   free [7f730 - 7f750]
[    0.000000]   free [100428 - 100600]
[    0.000000]   free [13ea01 - 13ec00]
[    0.000000]   free [170800 - 2080000]
[    0.000000]   total free 1f87170

[   92.689485] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[   92.699799] Placing 64MB software IO TLB between ffff8800055dd000 - ffff8800095dd000
[   92.710916] software IO TLB at phys 0x55dd000 - 0x95dd000

so will get enough space below 4G, aka pfn 0x100000
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-15-git-send-email-yinghai@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

cef625ee

x86: Call early_res_to_bootmem one time · 1842f90c

由 Yinghai Lu 提交于 2月 10, 2010

Simplify setup_node_mem: don't use bootmem from other node, instead
just find_e820_area in early_node_mem.

This keeps the boundary between early_res and boot mem more clear, and
lets us only call early_res_to_bootmem() one time instead of for all
nodes.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-12-git-send-email-yinghai@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

1842f90c

03 2月, 2010 2 次提交

memory hotplug: fix a bug on /dev/mem for 64-bit kernels · ea085417

由 Shaohui Zheng 提交于 2月 02, 2010

Newly added memory can not be accessed via /dev/mem, because we do not
update the variables high_memory, max_pfn and max_low_pfn.

Add a function update_end_of_memory_vars() to update these variables for
64-bit kernels.

[akpm@linux-foundation.org: simplify comment]
Signed-off-by: NShaohui Zheng <shaohui.zheng@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Li Haicheng <haicheng.li@intel.com>
Reviewed-by: NWu Fengguang <fengguang.wu@intel.com>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ea085417

x86, doc: Fix minor spelling error in arch/x86/mm/gup.c · ab09809f

由 Andy Shevchenko 提交于 2月 02, 2010

Fix minor spelling error in comment.  No code change.
Signed-off-by: NAndy Shevchenko <ext-andriy.shevchenko@nokia.com>
LKML-Reference: <201002022238.o12McDiF018720@imap1.linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

ab09809f

02 2月, 2010 2 次提交

x86: Use the generic page_is_ram() · 13ca0fca

由 Wu Fengguang 提交于 1月 22, 2010

The generic resource based page_is_ram() works better with memory
hotplug/hotremove. So switch the x86 e820map based code to it.

CC: Andi Kleen <andi@firstfloor.org>
CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
CC: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
LKML-Reference: <20100122033004.470767217@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

13ca0fca

x86: Remove BIOS data range from e820 · 1b5576e6

由 Yinghai Lu 提交于 1月 22, 2010

In preparation for moving to the generic page_is_ram(), make explicit
what we expect to be reserved and not reserved.
Tested-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
LKML-Reference: <20100122033004.335813103@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

1b5576e6

28 1月, 2010 1 次提交

x86: Use helpers for rlimits · 2854e72b

由 Jiri Slaby 提交于 1月 27, 2010

Make sure compiler won't do weird things with limits.  Fetching them
twice may return 2 different values after writable limits are
implemented.

We can either use rlimit helpers added in
3e10e716 or ACCESS_ONCE if not
applicable; this patch uses the helpers.
Signed-off-by: NJiri Slaby <jslaby@suse.cz>
LKML-Reference: <1264609942-24621-1-git-send-email-jslaby@suse.cz>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

2854e72b

23 1月, 2010 1 次提交

x86: Set hotpluggable nodes in nodes_possible_map · 3a5fc0e4

由 David Rientjes 提交于 1月 20, 2010

nodes_possible_map does not currently include nodes that have SRAT
entries that are all ACPI_SRAT_MEM_HOT_PLUGGABLE since the bit is
cleared in nodes_parsed if it does not have an online address range.

Unequivocally setting the bit in nodes_parsed is insufficient since
existing code, such as acpi_get_nodes(), assumes all nodes in the map
have online address ranges.  In fact, all code using nodes_parsed
assumes such nodes represent an address range of online memory.

nodes_possible_map is created by unioning nodes_parsed and
cpu_nodes_parsed; the former represents nodes with online memory and
the latter represents memoryless nodes.  We now set the bit for
hotpluggable nodes in cpu_nodes_parsed so that it also gets set in
nodes_possible_map.

[ hpa: Haicheng Li points out that this makes the naming of the
  variable cpu_nodes_parsed somewhat counterintuitive.  However, leave
  it as is in the interest of keeping the pure bug fix patch small. ]
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Tested-by: NHaicheng Li <haicheng.li@linux.intel.com>
LKML-Reference: <alpine.DEB.2.00.1001201152040.30528@chino.kir.corp.google.com>
Cc: <stable@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

3a5fc0e4

17 1月, 2010 1 次提交

hw-breakpoints, perf: Fix broken mmiotrace due to dr6 by reference change · 0bb7a95f

由 Luca Barbieri 提交于 1月 16, 2010

Commit 62edab90 (from June 2009
but merged in 2.6.33) changes notify_die to pass dr6 by
reference.

However, it forgets to fix the check for DR_STEP in kmmio.c,
breaking mmiotrace. It also passes a wrong value to the post
handler.

This simple fix makes mmiotrace work again.
Signed-off-by: NLuca Barbieri <luca@luca-barbieri.com>
Acked-by: NK.Prasad <prasad@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1263634770-14578-1-git-send-email-luca@luca-barbieri.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0bb7a95f

12 1月, 2010 1 次提交

mm: make totalhigh_pages unsigned long · 4b529401

由 Andreas Fenkart 提交于 1月 08, 2010

Makes it consistent with the extern declaration, used when CONFIG_HIGHMEM
is set Removes redundant casts in printout messages
Signed-off-by: NAndreas Fenkart <andreas.fenkart@streamunlimited.com>
Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4b529401

30 12月, 2009 1 次提交

x86: Lift restriction on the location of FIX_BTMAP_* · 499a5f1e

由 Jan Beulich 提交于 12月 18, 2009

The early ioremap fixmap entries cover half (or for 32-bit
non-PAE, a quarter) of a page table, yet they got
uncondtitionally aligned so far to a 256-entry boundary. This is
not necessary if the range of page table entries anyway falls
into a single page table.

This buys back, for (theoretically) 50% of all configurations
(25% of all non-PAE ones), at least some of the lowmem
necessarily lost with commit e621bd18.
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <4B2BB66F0200007800026AD6@vpn.id2.novell.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

499a5f1e

28 12月, 2009 1 次提交

x86, kmemcheck: Use KERN_WARNING for error reporting · c0ca9da4

由 Pekka Enberg 提交于 12月 28, 2009

As suggested by Vegard Nossum, use KERN_WARNING for error
reporting to make sure kmemcheck reports end up in syslog.
Suggested-by: NVegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <1261990935.4641.7.camel@penberg-laptop>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c0ca9da4

17 12月, 2009 1 次提交

x86: Fix checking of SRAT when node 0 ram is not from 0 · 32996250

由 Yinghai Lu 提交于 12月 15, 2009

Found one system that boot from socket1 instead of socket0, SRAT get rejected...

[    0.000000] SRAT: Node 1 PXM 0 0-a0000
[    0.000000] SRAT: Node 1 PXM 0 100000-80000000
[    0.000000] SRAT: Node 1 PXM 0 100000000-2080000000
[    0.000000] SRAT: Node 0 PXM 1 2080000000-4080000000
[    0.000000] SRAT: Node 2 PXM 2 4080000000-6080000000
[    0.000000] SRAT: Node 3 PXM 3 6080000000-8080000000
[    0.000000] SRAT: Node 4 PXM 4 8080000000-a080000000
[    0.000000] SRAT: Node 5 PXM 5 a080000000-c080000000
[    0.000000] SRAT: Node 6 PXM 6 c080000000-e080000000
[    0.000000] SRAT: Node 7 PXM 7 e080000000-10080000000
...
[    0.000000] NUMA: Allocated memnodemap from 500000 - 701040
[    0.000000] NUMA: Using 20 for the hash shift.
[    0.000000] Adding active range (0, 0x2080000, 0x4080000) 0 entries of 3200 used
[    0.000000] Adding active range (1, 0x0, 0x96) 1 entries of 3200 used
[    0.000000] Adding active range (1, 0x100, 0x7f750) 2 entries of 3200 used
[    0.000000] Adding active range (1, 0x100000, 0x2080000) 3 entries of 3200 used
[    0.000000] Adding active range (2, 0x4080000, 0x6080000) 4 entries of 3200 used
[    0.000000] Adding active range (3, 0x6080000, 0x8080000) 5 entries of 3200 used
[    0.000000] Adding active range (4, 0x8080000, 0xa080000) 6 entries of 3200 used
[    0.000000] Adding active range (5, 0xa080000, 0xc080000) 7 entries of 3200 used
[    0.000000] Adding active range (6, 0xc080000, 0xe080000) 8 entries of 3200 used
[    0.000000] Adding active range (7, 0xe080000, 0x10080000) 9 entries of 3200 used
[    0.000000] SRAT: PXMs only cover 917504MB of your 1048566MB e820 RAM. Not used.
[    0.000000] SRAT: SRAT not used.

the early_node_map is not sorted because node0 with non zero start come first.

so try to sort it right away after all regions are registered.

also fixs refression by 8716273c (x86: Export srat physical topology)

-v2: make it more solid to handle cross node case like node0 [0,4g), [8,12g) and node1 [4g, 8g), [12g, 16g)
-v3: update comments.
Reported-and-tested-by: NJens Axboe <jens.axboe@oracle.com>
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
LKML-Reference: <4B2579D2.3010201@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

32996250

14 12月, 2009 1 次提交

x86: Fix build warning in arch/x86/mm/mmio-mod.c · eba11d6d

由 Joe Perches 提交于 12月 13, 2009

Stephen Rothwell reported these warnings:

 arch/x86/mm/mmio-mod.c: In function 'print_pte':
 arch/x86/mm/mmio-mod.c:100: warning: too many arguments for format
 arch/x86/mm/mmio-mod.c:106: warning: too many arguments for format

The 'fmt' was left out accidentally.
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NJoe Perches <joe@perches.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus <torvalds@linux-foundation.org>
LKML-Reference: <1260775443.18538.16.camel@Joe-Laptop.home>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

eba11d6d

10 12月, 2009 3 次提交

vfs: Implement proper O_SYNC semantics · 6b2f3d1f

由 Christoph Hellwig 提交于 10月 27, 2009

While Linux provided an O_SYNC flag basically since day 1, it took until
Linux 2.4.0-test12pre2 to actually get it implemented for filesystems,
since that day we had generic_osync_around with only minor changes and the
great "For now, when the user asks for O_SYNC, we'll actually give
O_DSYNC" comment.  This patch intends to actually give us real O_SYNC
semantics in addition to the O_DSYNC semantics.  After Jan's O_SYNC
patches which are required before this patch it's actually surprisingly
simple, we just need to figure out when to set the datasync flag to
vfs_fsync_range and when not.

This patch renames the existing O_SYNC flag to O_DSYNC while keeping it's
numerical value to keep binary compatibility, and adds a new real O_SYNC
flag.  To guarantee backwards compatiblity it is defined as expanding to
both the O_DSYNC and the new additional binary flag (__O_SYNC) to make
sure we are backwards-compatible when compiled against the new headers.

This also means that all places that don't care about the differences can
just check O_DSYNC and get the right behaviour for O_SYNC, too - only
places that actuall care need to check __O_SYNC in addition.  Drivers and
network filesystems have been updated in a fail safe way to always do the
full sync magic if O_DSYNC is set.  The few places setting O_SYNC for
lower layers are kept that way for now to stay failsafe.

We enforce that O_DSYNC is set when __O_SYNC is set early in the open path
to make sure we always get these sane options.

Note that parisc really screwed up their headers as they already define a
O_DSYNC that has always been a no-op.  We try to repair it by using it for
the new O_DSYNC and redefinining O_SYNC to send both the traditional
O_SYNC numerical value _and_ the O_DSYNC one.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger@sun.com>
Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: NKyle McMartin <kyle@mcmartin.ca>
Acked-by: NUlrich Drepper <drepper@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NJan Kara <jack@suse.cz>

6b2f3d1f

x86: mmio-mod.c: Use pr_fmt · 3a0340be

由 Joe Perches 提交于 12月 09, 2009

- Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 - Remove #define NAME
 - Remove NAME from pr_<level>
Signed-off-by: NJoe Perches <joe@perches.com>
LKML-Reference: <009cb214c45ef932df0242856228f4739cc91408.1260383912.git.joe@perches.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3a0340be

x86: kmmio.c: Add and use pr_fmt(fmt) · 1bd591a5

由 Joe Perches 提交于 12月 09, 2009

- Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 - Strip "kmmio: " from pr_<level>s
Signed-off-by: NJoe Perches <joe@perches.com>
LKML-Reference: <7aa509f8a23933036d39f54bd51e9acc52068049.1260383912.git.joe@perches.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1bd591a5

06 12月, 2009 1 次提交

x86: Fix typo in arch/x86/mm/kmmio.c · 8055039c

由 Shaun Patterson 提交于 12月 05, 2009

Signed-off-by: NShaun Patterson <shaunpatterson@gmail.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: pq@iki.fi
LKML-Reference: <1260027694.10074.170.camel@linux-4lgc.site>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8055039c

04 12月, 2009 1 次提交

tree-wide: fix assorted typos all over the place · af901ca1

由 André Goddard Rosa 提交于 11月 14, 2009

That is "success", "unknown", "through", "performance", "[re|un]mapping"
, "access", "default", "reasonable", "[con]currently", "temperature"
, "channel", "[un]used", "application", "example","hierarchy", "therefore"
, "[over|under]flow", "contiguous", "threshold", "enough" and others.
Signed-off-by: NAndré Goddard Rosa <andre.goddard@gmail.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

af901ca1

26 11月, 2009 1 次提交

x86/pat: Trivial: don't create debugfs for memtype if pat is disabled · dd4377b0

由 Xiaotian Feng 提交于 11月 26, 2009

If pat is disabled (boot with nopat), there's no need to create
debugfs for it, it's empty all the time.
Signed-off-by: NXiaotian Feng <dfeng@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <1259236428-16329-1-git-send-email-dfeng@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

dd4377b0

24 11月, 2009 4 次提交

x86: Move find_smp_config() earlier and avoid bootmem usage · b24c2a92

由 Yinghai Lu 提交于 11月 24, 2009

Move the find_smp_config() call to before bootmem is initialized.
Use reserve_early() instead of reserve_bootmem() in it.

This simplifies the code, we only need to call find_smp_config()
once and can remove the now unneeded reserve parameter from
x86_init_mpparse::find_smp_config.

We thus also reduce x86's dependency on bootmem allocations.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
LKML-Reference: <4B0BB9F2.70907@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b24c2a92

x86, platform: Change is_untracked_pat_range() to bool; cleanup init · eb41c8be

由 H. Peter Anvin 提交于 11月 23, 2009

- Change is_untracked_pat_range() to return bool.
- Clean up the initialization of is_untracked_pat_range() -- by default,
  we simply point it at is_ISA_range() directly.
- Move is_untracked_pat_range to the end of struct x86_platform, since
  it is the newest field.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20091119202341.GA4420@sgi.com>

eb41c8be

x86, mm: is_untracked_pat_range() takes a normal semiclosed range · 8a271389

由 H. Peter Anvin 提交于 11月 23, 2009

is_untracked_pat_range() -- like its components, is_ISA_range() and
is_GRU_range(), takes a normal semiclosed interval (>=, <) whereas the
PAT code called it as if it took a closed range (>=, <=).  Fix.

Although this is a bug, I believe it is non-manifest, simply because
none of the callers will call this with non-page-aligned addresses.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20091119202341.GA4420@sgi.com>

8a271389

x86: UV SGI: Don't track GRU space in PAT · fd12a0d6

由 Jack Steiner 提交于 11月 19, 2009

GRU space is always mapped as WB in the page table. There is
no need to track the mappings in the PAT. This also eliminates
the "freeing invalid memtype" messages when the GRU space is
unmapped.
Signed-off-by: NJack Steiner <steiner@sgi.com>
LKML-Reference: <20091119202341.GA4420@sgi.com>
[ v2: fix build failure ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

fd12a0d6

23 11月, 2009 4 次提交

x86: Suppress stack overrun message for init_task · 0e7810be

由 Jan Beulich 提交于 11月 20, 2009

init_task doesn't get its stack end location set to
STACK_END_MAGIC, and hence the message is confusing
rather than helpful in this case.
Signed-off-by: NJan Beulich <jbeulich@novell.com>
LKML-Reference: <4B06AEFE02000078000211F4@vpn.id2.novell.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0e7810be

x86, numa: Use near(er) online node instead of roundrobin for NUMA · d9c2d5ac

由 Yinghai Lu 提交于 11月 21, 2009

CPU to node mapping is set via the following sequence:

 1. numa_init_array(): Set up roundrobin from cpu to online node

 2. init_cpu_to_node(): Set that according to apicid_to_node[]
			according to srat only handle the node that
			is online, and leave other cpu on node
			without ram (aka not online) to still
			roundrobin.

3. later call srat_detect_node for Intel/AMD, will use first_online
   node or nearby node.

Problem is that setup_per_cpu_areas() is not called between 2 and 3,
the per_cpu for cpu on node with ram is on different node, and could
put that on node with two hops away.

So try to optimize this and add find_near_online_node() and call
init_cpu_to_node().
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <4B07A739.3030104@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d9c2d5ac

x86, numa, bootmem: Only free bootmem on NUMA failure path · 021428ad

由 Yinghai Lu 提交于 11月 21, 2009

In the NUMA bootmem setup failure path we freed nodedata_phys
incorrectly.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <4B07A739.3030104@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

021428ad

x86: apic: Print out SRAT table APIC id in hex · 163d3866

由 Yinghai Lu 提交于 11月 21, 2009

Make it consistent with APIC MADT print out,
for big systems APIC id in hex is more readable.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
LKML-Reference: <4B07A739.3030104@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

163d3866

19 11月, 2009 1 次提交

x86: Eliminate redundant/contradicting cache line size config options · 350f8f56

由 Jan Beulich 提交于 11月 13, 2009

Rather than having X86_L1_CACHE_BYTES and X86_L1_CACHE_SHIFT
(with inconsistent defaults), just having the latter suffices as
the former can be easily calculated from it.

To be consistent, also change X86_INTERNODE_CACHE_BYTES to
X86_INTERNODE_CACHE_SHIFT, and set it to 7 (128 bytes) for NUMA
to account for last level cache line size (which here matters
more than L1 cache line size).

Finally, make sure the default value for X86_L1_CACHE_SHIFT,
when X86_GENERIC is selected, is being seen before that for the
individual CPU model options (other than on x86-64, where
GENERIC_CPU is part of the choice construct, X86_GENERIC is a
separate option on ix86).
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Acked-by: NRavikiran Thirumalai <kiran@scalex86.org>
Acked-by: NNick Piggin <npiggin@suse.de>
LKML-Reference: <4AFD5710020000780001F8F0@vpn.id2.novell.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

350f8f56

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功