提交 · 2ab640379a0ab4cef746ced1d7e04a0941774bcb · OpenHarmony / kernel_linux

13 11月, 2008 1 次提交

x86, hibernate: fix breakage on x86_32 with CONFIG_NUMA set · 97a70e54

由 Rafael J. Wysocki 提交于 11月 12, 2008

Impact: fix crash during hibernation on 32-bit NUMA

The NUMA code on x86_32 creates special memory mapping that allows
each node's pgdat to be located in this node's memory.  For this
purpose it allocates a memory area at the end of each node's memory
and maps this area so that it is accessible with virtual addresses
belonging to low memory.  As a result, if there is high memory,
these NUMA-allocated areas are physically located in high memory,
although they are mapped to low memory addresses.

Our hibernation code does not take that into account and for this
reason hibernation fails on all x86_32 systems with CONFIG_NUMA=y and
with high memory present.  Fix this by adding a special mapping for
the NUMA-allocated memory areas to the temporary page tables created
during the last phase of resume.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

97a70e54

06 11月, 2008 1 次提交

x86: align DirectMap in /proc/meminfo · b9c3bfc2

由 Hugh Dickins 提交于 11月 06, 2008

Impact: right-align /proc/meminfo consistent with other fields

When the split-LRU patches added Inactive(anon) and Inactive(file) lines
to /proc/meminfo, all counts were moved two columns rightwards to fit in.
Now move x86's DirectMap lines two columns rightwards to line up.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b9c3bfc2

31 10月, 2008 2 次提交

x86: add iomap_atomic*()/iounmap_atomic() on 32-bit using fixmaps · fd940934

由 Keith Packard 提交于 10月 30, 2008

Impact: introduce new APIs, separate kmap code from CONFIG_HIGHMEM

This takes the code used for CONFIG_HIGHMEM memory mappings except that
it's designed for dynamic IO resource mapping.

These fixmaps are available even with CONFIG_HIGHMEM turned off.
Signed-off-by: NKeith Packard <keithp@keithp.com>
Signed-off-by: NEric Anholt <eric@anholt.net>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

fd940934

x86: fix /dev/mem mmap breakage when PAT is disabled · 9e41bff2

由 Ravikiran G Thirumalai 提交于 10月 30, 2008

Impact: allow /dev/mem mmaps on non-PAT CPUs/platforms

Fix mmap to /dev/mem when CONFIG_X86_PAT is off and CONFIG_STRICT_DEVMEM is
off

mmap to /dev/mem on kernel memory has been failing since the
introduction of PAT (CONFIG_STRICT_DEVMEM=n case).   Seems like
the check to avoid cache aliasing with PAT is kicking in even
when PAT is disabled. The bug seems to have crept in 2.6.26.

This patch makes sure that the mmap to regular
kernel memory succeeds if CONFIG_STRICT_DEVMEM=n and
PAT is disabled, and the checks to avoid cache aliasing
still happens if PAT is enabled.
Signed-off-by: NRavikiran Thirumalai <kiran@scalex86.org>
Tested-by: NTim Sirianni <tim@scalemp.com>
Cc: <stable@kernel.org>
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

9e41bff2

29 10月, 2008 4 次提交

x86: remove debug code from arch_add_memory() · fe8b868e

由 Gary Hade 提交于 10月 28, 2008

Impact: remove incorrect WARN_ON(1)

Gets rid of dmesg spam created during physical memory hot-add which
will very likely confuse users.  The change removes what appears to
be debugging code which I assume was unintentionally included in:

  x86: arch/x86/mm/init_64.c printk fixes
  commit 10f22ddeSigned-off-by: NGary Hade <garyhade@us.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

fe8b868e

x86: start annotating early ioremap pointers with __iomem · 1d6cf1fe

由 Harvey Harrison 提交于 10月 28, 2008

Impact: some new sparse warnings in e820.c etc, but no functional change.

As with regular ioremap, iounmap etc, annotate with __iomem.

Fixes the following sparse warnings, will produce some new ones
elsewhere in arch/x86 that will get worked out over time.

arch/x86/mm/ioremap.c:402:9: warning: cast removes address space of expression
arch/x86/mm/ioremap.c:406:10: warning: cast adds address space to expression (<asn:2>)
arch/x86/mm/ioremap.c:782:19: warning: Using plain integer as NULL pointer
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1d6cf1fe

x86: two trivial sparse annotations · 9352f569

由 Harvey Harrison 提交于 10月 28, 2008

Impact: fewer sparse warnings, no functional changes

arch/x86/kernel/vsmp_64.c:87:14: warning: incorrect type in argument 1 (different address spaces)
arch/x86/kernel/vsmp_64.c:87:14: expected void const volatile [noderef] <asn:2>*addr
arch/x86/kernel/vsmp_64.c:87:14: got void *[assigned] address
arch/x86/kernel/vsmp_64.c:88:22: warning: incorrect type in argument 1 (different address spaces)
arch/x86/kernel/vsmp_64.c:88:22: expected void const volatile [noderef] <asn:2>*addr
arch/x86/kernel/vsmp_64.c:88:22: got void *
arch/x86/kernel/vsmp_64.c:100:23: warning: incorrect type in argument 2 (different address spaces)
arch/x86/kernel/vsmp_64.c:100:23: expected void volatile [noderef] <asn:2>*addr
arch/x86/kernel/vsmp_64.c:100:23: got void *
arch/x86/kernel/vsmp_64.c:101:23: warning: incorrect type in argument 1 (different address spaces)
arch/x86/kernel/vsmp_64.c:101:23: expected void const volatile [noderef] <asn:2>*addr
arch/x86/kernel/vsmp_64.c:101:23: got void *
arch/x86/mm/gup.c:235:6: warning: incorrect type in argument 1 (different base types)
arch/x86/mm/gup.c:235:6: expected void const volatile [noderef] <asn:1>*<noident>
arch/x86/mm/gup.c:235:6: got unsigned long [unsigned] [assigned] start
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

9352f569

x86: fix init_memory_mapping for [dc000000 - e0000000) - v2 · f96f57d9

由 Yinghai Lu 提交于 10月 28, 2008

Impact: change over-mapping to precise mapping, fix /proc/meminfo output

v2: fix less than 1G ram system handling

when gart aperture is 0xdc000000 - 0xe0000000
it return 0xc0000000 - 0xe0000000

that is not right.

this patch fix that will get exact mapping

on 256g sytem with that aperture after patch
LBSuse:~ # cat /proc/meminfo
MemTotal:       264742432 kB
MemFree:        263920628 kB
Buffers:            1416 kB
Cached:            24468 kB
...
DirectMap4k:      5760 kB
DirectMap2M:   3205120 kB
DirectMap1G:  265289728 kB

it is consistent to
LBSuse:~ # cat /sys/kernel/debug/kernel_page_tables
..
---[ Low Kernel Mapping ]---
0xffff880000000000-0xffff880000200000           2M     RW             GLB x  pte
0xffff880000200000-0xffff880040000000        1022M     RW         PSE GLB x  pmd
0xffff880040000000-0xffff8800c0000000           2G     RW         PSE GLB NX pud
0xffff8800c0000000-0xffff8800d7e00000         382M     RW         PSE GLB NX pmd
0xffff8800d7e00000-0xffff8800d7fa0000        1664K     RW             GLB NX pte
0xffff8800d7fa0000-0xffff8800d8000000         384K                           pte
0xffff8800d8000000-0xffff8800dc000000          64M                           pmd
0xffff8800dc000000-0xffff8800e0000000          64M     RW         PSE GLB NX pmd
0xffff8800e0000000-0xffff880100000000         512M                           pmd
0xffff880100000000-0xffff880800000000          28G     RW         PSE GLB NX pud
0xffff880800000000-0xffff880824600000         582M     RW         PSE GLB NX pmd
0xffff880824600000-0xffff8808247f0000        1984K     RW             GLB NX pte
0xffff8808247f0000-0xffff880824800000          64K     RW     PCD     GLB NX pte
0xffff880824800000-0xffff880840000000         440M     RW         PSE GLB NX pmd
0xffff880840000000-0xffff884000000000         223G     RW         PSE GLB NX pud
0xffff884000000000-0xffff884028000000         640M     RW         PSE GLB NX pmd
0xffff884028000000-0xffff884040000000         384M                           pmd
0xffff884040000000-0xffff888000000000         255G                           pud
0xffff888000000000-0xffffc20000000000       58880G                           pgd
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f96f57d9

28 10月, 2008 3 次提交

x86: 64 bit print out absent pages num too · 11a6b0c9

由 Yinghai Lu 提交于 10月 14, 2008

so users are not confused with memhole causing big total ram

we don't need to worry about 32 bit, because memhole is always
above max_low_pfn.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

11a6b0c9

x86, memory hotplug: remove wrong -1 in calling init_memory_mapping() · 60817c9b

由 Shaohua Li 提交于 10月 27, 2008

Impact: fix crash with memory hotplug

Shuahua Li found:

| I just did some experiments on a desktop for memory hotplug and this bug
| triggered a crash in my test.
|
| Yinghai's suggestion also fixed the bug.

We don't need to round it, just remove that extra -1
Signed-off-by: NYinghai <yinghai@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

60817c9b

x86: keep the /proc/meminfo page count correct · 3afa3949

由 Yinghai Lu 提交于 10月 25, 2008

Impact: get correct page count in /proc/meminfo

found page count in /proc/meminfo is nor correct on 1G system in VirtualBox 2.0.4

# cat /proc/meminfo
MemTotal:        1017508 kB
MemFree:          822700 kB
Buffers:            1456 kB
Cached:            26632 kB
SwapCached:            0 kB
...
Hugepagesize:       2048 kB
DirectMap4k:      4032 kB
DirectMap2M:  18446744073709549568 kB

with this patch get:
...
DirectMap4k:      4032 kB
DirectMap2M:   1044480 kB

which is consistent to kernel_page_tables
---[ Low Kernel Mapping ]---
0xffff880000000000-0xffff880000001000           4K     RW     PCD     GLB x  pte
0xffff880000001000-0xffff88000009f000         632K     RW             GLB x  pte
0xffff88000009f000-0xffff8800000a0000           4K     RW     PCD     GLB x  pte
0xffff8800000a0000-0xffff880000200000        1408K     RW             GLB x  pte
0xffff880000200000-0xffff88003fe00000        1020M     RW         PSE GLB x  pmd
0xffff88003fe00000-0xffff88003fff0000        1984K     RW             GLB NX pte
0xffff88003fff0000-0xffff880040000000          64K                           pte
0xffff880040000000-0xffff888000000000         511G                           pud
0xffff888000000000-0xffffc20000000000       58880G                           pgd
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3afa3949

23 10月, 2008 1 次提交

proc: switch /proc/meminfo to seq_file · e1759c21

由 Alexey Dobriyan 提交于 10月 15, 2008

and move it to fs/proc/meminfo.c while I'm at it.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>

e1759c21

22 10月, 2008 1 次提交

x86: memtest fix use of reserve_early() · 2cb0ebee

由 Daniele Calore 提交于 10月 13, 2008

Hi all,

Wrong usage of 2nd parameter in reserve_early call.
66/75: reserve_early(start_bad, last_bad - start_bad, "BAD RAM");
                                ^^^^^^^^^^^^^^^^^^^^

The correct way is to use 'end' address and not 'size'.
As a bonus a fix to the printk format.
Signed-off-by: NDaniele Calore <orkaan@orkaan.org>
Acked-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

2cb0ebee

20 10月, 2008 1 次提交

mm: rewrite vmap layer · db64fe02

由 Nick Piggin 提交于 10月 18, 2008

Rewrite the vmap allocator to use rbtrees and lazy tlb flushing, and
provide a fast, scalable percpu frontend for small vmaps (requires a
slightly different API, though).

The biggest problem with vmap is actually vunmap.  Presently this requires
a global kernel TLB flush, which on most architectures is a broadcast IPI
to all CPUs to flush the cache.  This is all done under a global lock.  As
the number of CPUs increases, so will the number of vunmaps a scaled
workload will want to perform, and so will the cost of a global TLB flush.
 This gives terrible quadratic scalability characteristics.

Another problem is that the entire vmap subsystem works under a single
lock.  It is a rwlock, but it is actually taken for write in all the fast
paths, and the read locking would likely never be run concurrently anyway,
so it's just pointless.

This is a rewrite of vmap subsystem to solve those problems.  The existing
vmalloc API is implemented on top of the rewritten subsystem.

The TLB flushing problem is solved by using lazy TLB unmapping.  vmap
addresses do not have to be flushed immediately when they are vunmapped,
because the kernel will not reuse them again (would be a use-after-free)
until they are reallocated.  So the addresses aren't allocated again until
a subsequent TLB flush.  A single TLB flush then can flush multiple
vunmaps from each CPU.

XEN and PAT and such do not like deferred TLB flushing because they can't
always handle multiple aliasing virtual addresses to a physical address.
They now call vm_unmap_aliases() in order to flush any deferred mappings.
That call is very expensive (well, actually not a lot more expensive than
a single vunmap under the old scheme), however it should be OK if not
called too often.

The virtual memory extent information is stored in an rbtree rather than a
linked list to improve the algorithmic scalability.

There is a per-CPU allocator for small vmaps, which amortizes or avoids
global locking.

To use the per-CPU interface, the vm_map_ram / vm_unmap_ram interfaces
must be used in place of vmap and vunmap.  Vmalloc does not use these
interfaces at the moment, so it will not be quite so scalable (although it
will use lazy TLB flushing).

As a quick test of performance, I ran a test that loops in the kernel,
linearly mapping then touching then unmapping 4 pages.  Different numbers
of tests were run in parallel on an 4 core, 2 socket opteron.  Results are
in nanoseconds per map+touch+unmap.

threads           vanilla         vmap rewrite
1                 14700           2900
2                 33600           3000
4                 49500           2800
8                 70631           2900

So with a 8 cores, the rewritten version is already 25x faster.

In a slightly more realistic test (although with an older and less
scalable version of the patch), I ripped the not-very-good vunmap batching
code out of XFS, and implemented the large buffer mapping with vm_map_ram
and vm_unmap_ram...  along with a couple of other tricks, I was able to
speed up a large directory workload by 20x on a 64 CPU system.  I believe
vmap/vunmap is actually sped up a lot more than 20x on such a system, but
I'm running into other locks now.  vmap is pretty well blown off the
profiles.

Before:
1352059 total                                      0.1401
798784 _write_lock                              8320.6667 <- vmlist_lock
529313 default_idle                             1181.5022
 15242 smp_call_function                         15.8771  <- vmap tlb flushing
  2472 __get_vm_area_node                         1.9312  <- vmap
  1762 remove_vm_area                             4.5885  <- vunmap
   316 map_vm_area                                0.2297  <- vmap
   312 kfree                                      0.1950
   300 _spin_lock                                 3.1250
   252 sn_send_IPI_phys                           0.4375  <- tlb flushing
   238 vmap                                       0.8264  <- vmap
   216 find_lock_page                             0.5192
   196 find_next_bit                              0.3603
   136 sn2_send_IPI                               0.2024
   130 pio_phys_write_mmr                         2.0312
   118 unmap_kernel_range                         0.1229

After:
 78406 total                                      0.0081
 40053 default_idle                              89.4040
 33576 ia64_spinlock_contention                 349.7500
  1650 _spin_lock                                17.1875
   319 __reg_op                                   0.5538
   281 _atomic_dec_and_lock                       1.0977
   153 mutex_unlock                               1.5938
   123 iget_locked                                0.1671
   117 xfs_dir_lookup                             0.1662
   117 dput                                       0.1406
   114 xfs_iget_core                              0.0268
    92 xfs_da_hashname                            0.1917
    75 d_alloc                                    0.0670
    68 vmap_page_range                            0.0462 <- vmap
    58 kmem_cache_alloc                           0.0604
    57 memset                                     0.0540
    52 rb_next                                    0.1625
    50 __copy_user                                0.0208
    49 bitmap_find_free_region                    0.2188 <- vmap
    46 ia64_sn_udelay                             0.1106
    45 find_inode_fast                            0.1406
    42 memcmp                                     0.2188
    42 finish_task_switch                         0.1094
    42 __d_lookup                                 0.0410
    40 radix_tree_lookup_slot                     0.1250
    37 _spin_unlock_irqrestore                    0.3854
    36 xfs_bmapi                                  0.0050
    36 kmem_cache_free                            0.0256
    35 xfs_vn_getattr                             0.0322
    34 radix_tree_lookup                          0.1062
    33 __link_path_walk                           0.0035
    31 xfs_da_do_buf                              0.0091
    30 _xfs_buf_find                              0.0204
    28 find_get_page                              0.0875
    27 xfs_iread                                  0.0241
    27 __strncpy_from_user                        0.2812
    26 _xfs_buf_initialize                        0.0406
    24 _xfs_buf_lookup_pages                      0.0179
    24 vunmap_page_range                          0.0250 <- vunmap
    23 find_lock_page                             0.0799
    22 vm_map_ram                                 0.0087 <- vmap
    20 kfree                                      0.0125
    19 put_page                                   0.0330
    18 __kmalloc                                  0.0176
    17 xfs_da_node_lookup_int                     0.0086
    17 _read_lock                                 0.0885
    17 page_waitqueue                             0.0664

vmap has gone from being the top 5 on the profiles and flushing the crap
out of all TLBs, to using less than 1% of kernel time.

[akpm@linux-foundation.org: cleanups, section fix]
[akpm@linux-foundation.org: fix build on alpha]
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

db64fe02

18 10月, 2008 1 次提交

Export kmap_atomic_pfn for DRM-GEM. · d1d8c925

由 Eric Anholt 提交于 8月 21, 2008

The driver would like to map IO space directly for copying data in when
appropriate, to avoid CPU cache flushing for streaming writes.
kmap_atomic_pfn lets us avoid IPIs associated with ioremap for this process.
Signed-off-by: NEric Anholt <eric@anholt.net>
Signed-off-by: NDave Airlie <airlied@redhat.com>

d1d8c925

16 10月, 2008 2 次提交
- I
  genirq: remove artifacts from sparseirq removal · a1aca5de
  由 Ingo Molnar 提交于 10月 15, 2008
```
Signed-off-by: NIngo Molnar <mingo@elte.hu>
```
  a1aca5de
- Y
  x86: add after_bootmem flag for 32bit · fe648be0
  由 Yinghai Lu 提交于 8月 19, 2008
```
to prepare to use dyn_array support etc.
Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
```
  fe648be0
14 10月, 2008 5 次提交

mmiotrace: remove left-over marker cruft · 44274141

由 Pekka Paalanen 提交于 9月 16, 2008

Signed-off-by: NPekka Paalanen <pq@iki.fi>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

44274141

x86 mmiotrace: implement mmiotrace_printk() · 9e57fb35

由 Pekka Paalanen 提交于 9月 16, 2008

Offer mmiotrace users a function to inject markers from inside the kernel.
This depends on the trace_vprintk() patch.
Signed-off-by: NPekka Paalanen <pq@iki.fi>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

9e57fb35

x86 mmiotrace: fix a rare memory leak · bbe5c783

由 Pekka Paalanen 提交于 9月 16, 2008

Signed-off-by: NPekka Paalanen <pq@iki.fi>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

bbe5c783

x86: fix mmiotrace 8-bit register decoding · 611b1597

由 Pekka Paalanen 提交于 7月 21, 2008

When SIL, DIL, BPL or SPL registers were used in MMIO, the datum
was extracted from AH, BH, CH, or DH, which are incorrect.
Signed-off-by: NPekka Paalanen <pq@iki.fi>
Cc: "Vegard Nossum" <vegard.nossum@gmail.com>
Cc: "Steven Rostedt" <srostedt@redhat.com>
Cc: proski@gnu.org
Cc: "Pekka Enberg"
	<penberg@cs.helsinki.fi>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

611b1597

x86/mm: unify init task OOM handling · 3a1dfe6e

由 Ingo Molnar 提交于 10月 13, 2008

Linus noticed that the "again:" versus "survive:" OOM logic for
the init task was arbitrarily different.

The 64-bit codepath is the better one, because it correctly re-lookups
the vma after having dropped the ->mmap_sem.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>

3a1dfe6e

13 10月, 2008 11 次提交

x86/mm: do not trigger a kernel warning if user-space disables interrupts and... · 891cffbd

由 Linus Torvalds 提交于 10月 12, 2008

x86/mm: do not trigger a kernel warning if user-space disables interrupts and generates a page fault

Arjan reported a spike in the following bug pattern in v2.6.27:

   http://www.kerneloops.org/searchweek.php?search=lock_page

which happens because hwclock started triggering warnings due to
a (correct) might_sleep() check in the MM code.

The warning occurs because hwclock uses this dubious sequence of
code to run "atomic" code:

  static unsigned long
  atomic(const char *name, unsigned long (*op)(unsigned long),
         unsigned long arg)
  {
    unsigned long v;
    __asm__ volatile ("cli");
    v = (*op)(arg);
    __asm__ volatile ("sti");
    return v;
  }

Then it pagefaults in that "atomic" section, triggering the warning.

There is no way the kernel could provide "atomicity" in this path,
a page fault is a cannot-continue machine event so the kernel has to
wait for the page to be filled in.

Even if it was just a minor fault we'd have to take locks and might have
to spend quite a bit of time with interrupts disabled - not nice to irq
latencies in general.

So instead just enable interrupts in the pagefault path unconditionally
if we come from user-space, and handle the fault.

Also, while touching this code, unify some trivial parts of the x86
VM paths at the same time.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Reported-by: NArjan van de Ven <arjan@infradead.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

891cffbd

x86: change early_ioremap to use slots instead of nesting · c1a2f4b1

由 Yinghai Lu 提交于 9月 14, 2008

so we could remove the requirement that one needs to call
early_iounmap() in exactly reverse order of early_ioremap().
Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c1a2f4b1

x86: fix virt_addr_valid() with CONFIG_DEBUG_VIRTUAL=y, v2 · af5c2bd1

由 Vegard Nossum 提交于 10月 03, 2008

virt_addr_valid() calls __pa(), which calls __phys_addr(). With
CONFIG_DEBUG_VIRTUAL=y, __phys_addr() will kill the kernel if the
address *isn't* valid. That's clearly wrong for virt_addr_valid().

We also incorporate the debugging checks into virt_addr_valid().
Signed-off-by: NVegard Nossum <vegardno@ben.ifi.uio.no>
Acked-by: NJiri Slaby <jirislaby@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

af5c2bd1

traps: x86: remove trace_hardirqs_fixup from pagefault handler · 69c89b5b

由 Alexander van Heukelum 提交于 9月 26, 2008

The last use of trace_hardirqs_fixup is unnecessary, because the
trap is taken with interrupt off on i386 as well as x86_64, and
the irq-tracer is notified of this from the assembly code.

trace_hardirqs_fixup and trace_hardirqs_fixup_flags are removed
from include/asm-x86/irqflags.h as they are no longer used.
Signed-off-by: NAlexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

69c89b5b

x86, uv: add early detection of UV system types · 2e42060c

由 Jack Steiner 提交于 9月 23, 2008

Portions of the ACPI code needs to know if a system is a UV system prior
to genapic initialization. This patch adds a call early_acpi_boot_init()
so that the apic type is discovered earlier.

V2 of the patch adding fixes from Yinghai Lu.
Much cleaner and smaller.
Signed-off-by: NJack Steiner <steiner@sgi.com>
Acked-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

2e42060c

x86: make mm/gup.c more virtualization friendly · 606ee44d

由 Jan Beulich 提交于 9月 17, 2008

Since pte_flags() is much cheaper than pte_val() in some virtualized
environments (namely, Xen), use the former whereever possible.
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Cc: "Nick Piggin" <npiggin@suse.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

606ee44d

x86-64: fix combining of regions in init_memory_mapping() · 5e72d9e4

由 Jan Beulich 提交于 9月 12, 2008

When nr_range gets decremented, the same slot must be considered for
coalescing with its new successor again.

The issue is apparently pretty benign to native code, but surfaces as a
boot time crash in our forward ported Xen tree (where the page table
setup overall works differently than in native).
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Acked-by: NYinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

5e72d9e4

x86-64: don't check for map replacement · a32ad462

由 Jeremy Fitzhardinge 提交于 9月 07, 2008

The check prevents flags on mappings from being changed, which is not
desireable.  There's no need to check for replacing a mapping, and
x86-32 does not do this check.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a32ad462

x86: add early_memremap() · 14941779

由 Jeremy Fitzhardinge 提交于 9月 07, 2008

early_ioremap() is also used to map normal memory when constructing
the linear memory mapping.  However, since we sometimes need to be able
to distinguish between actual IO mappings and normal memory mappings,
add a early_memremap() call, which maps with PAGE_KERNEL (as opposed
to PAGE_KERNEL_IO for early_ioremap()), and use it when constructing
pagetables.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

14941779

x86: add _PAGE_IOMAP pte flag for IO mappings · be43d728

由 Jeremy Fitzhardinge 提交于 9月 07, 2008

Use one of the software-defined PTE bits to indicate that a mapping is
intended for an IO address. On native hardware this is irrelevent,
since a physical address is a physical address. But in a virtual
environment, physical addresses are also virtualized, so there needs
to be some way to distinguish between pseudo-physical addresses and
actual hardware addresses; _PAGE_IOMAP indicates this intent.

By default, __supported_pte_mask masks out _PAGE_IOMAP, so it doesn't
even appear in the final pagetable.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

be43d728

x86: rename discontig_32.c to numa_32.c · 927604c7

由 Yinghai Lu 提交于 9月 09, 2008

name it in line with its purpose.
Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

927604c7

12 10月, 2008 2 次提交

x86: memory corruption check - cleanup · 46eaa670

由 Ingo Molnar 提交于 10月 12, 2008

Move the prototypes from the generic kernel.h header to the more
appropriate include/asm-x86/bios_ebda.h header file.

Also, remove the check from the power management code - this is a
pure x86 matter for now.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

46eaa670

x86, early_ioremap: fix fencepost error · c613ec1a

由 Alan Cox 提交于 10月 10, 2008

The x86 implementation of early_ioremap has an off by one error. If we get
an object which ends on the first byte of a page we undermap by one page and
this causes a crash on boot with the ASUS P5QL whose DMI table happens to fit
this alignment.

The size computation is currently

	last_addr = phys_addr + size - 1;
	npages = (PAGE_ALIGN(last_addr) - phys_addr)

(Consider a request for 1 byte at alignment 0...)

Closes #11693

Debugging work by Ian Campbell/Felix Geyer
Signed-off-by: NAlan Cox <alan@rehat.com>
Cc: <stable@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c613ec1a

11 10月, 2008 5 次提交

x86, cpa: make the kernel physical mapping initialization a two pass sequence, fix · b27a43c1

由 Suresh Siddha 提交于 10月 07, 2008

Jeremy Fitzhardinge wrote:

> I'd noticed that current tip/master hasn't been booting under Xen, and I
> just got around to bisecting it down to this change.
>
> commit 065ae73c5462d42e9761afb76f2b52965ff45bd6
> Author: Suresh Siddha <suresh.b.siddha@intel.com>
>
>    x86, cpa: make the kernel physical mapping initialization a two pass sequence
>
> This patch is causing Xen to fail various pagetable updates because it
> ends up remapping pagetables to RW, which Xen explicitly prohibits (as
> that would allow guests to make arbitrary changes to pagetables, rather
> than have them mediated by the hypervisor).

Instead of making init a two pass sequence, to satisfy the Intel's TLB
Application note (developer.intel.com/design/processor/applnots/317080.pdf
Section 6 page 26), we preserve the original page permissions
when fragmenting the large mappings and don't touch the existing memory
mapping (which satisfies Xen's requirements).

Only open issue is: on a native linux kernel, we will go back to mapping
the first 0-1GB kernel identity mapping as executable (because of the
static mapping setup in head_64.S). We can fix this in a different
patch if needed.
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Acked-by: NJeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b27a43c1

x86, pat: cleanups · ad2cde16

由 Ingo Molnar 提交于 9月 30, 2008

clean up recently added code to be more consistent with other x86 code.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ad2cde16

x86: fix pagetable init 64-bit breakage · 28dd033f

由 Suresh Siddha 提交于 9月 29, 2008

Fix _end alignment check - can trigger a crash if _end happens to be
on a page boundary.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

28dd033f

x86: track memtype for RAM in page struct · 9542ada8

由 Suresh Siddha 提交于 9月 24, 2008

Track the memtype for RAM pages in page struct instead of using the
memtype list. This avoids the explosion in the number of entries in
memtype list (of the order of 20,000 with AGP) and makes the PAT
tracking simpler.

We are using PG_arch_1 bit in page->flags.

We still use the memtype list for non RAM pages.
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

9542ada8

x86, cpa: srlz cpa(), global flush tlb after splitting big page and before doing cpa · ad5ca55f

由 Suresh Siddha 提交于 9月 23, 2008

Do a global flush tlb after splitting the large page and before we do the
actual change page attribute in the PTE.

With out this, we violate the TLB application note, which says
    "The TLBs may contain both ordinary and large-page translations for
     a 4-KByte range of linear addresses. This may occur if software
     modifies the paging structures so that the page size used for the
     address range changes. If the two translations differ with respect
     to page frame or attributes (e.g., permissions), processor behavior
     is undefined and may be implementation-specific."

And also serialize cpa() (for !DEBUG_PAGEALLOC which uses large identity
mappings) using cpa_lock. So that we don't allow any other cpu, with stale
large tlb entries change the page attribute in parallel to some other cpu
splitting a large page entry along with changing the attribute.
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: arjan@linux.intel.com
Cc: venkatesh.pallipadi@intel.com
Cc: jeremy@goop.org
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ad5ca55f

OpenHarmony / kernel_linux 上一次同步 大约 4 年

OpenHarmony / kernel_linux
上一次同步大约 4 年