提交 · 5d5791af4c0d4fd32093882357506355c3357503 · openeuler / Kernel

05 8月, 2011 1 次提交

x86-64, xen: Enable the vvar mapping · 5d5791af

由 Andy Lutomirski 提交于 8月 03, 2011

Xen needs to handle VVAR_PAGE, introduced in git commit:
9fd67b4e
x86-64: Give vvars their own page

Otherwise we die during bootup with a message like:

(XEN) mm.c:940:d10 Error getting mfn 1888 (pfn 1e3e48) from L1 entry
      8000000001888465 for l1e_owner=10, pg_owner=10
(XEN) mm.c:5049:d10 ptwr_emulate: could not get_page_from_l1e()
[    0.000000] BUG: unable to handle kernel NULL pointer dereference at (null)
[    0.000000] IP: [<ffffffff8103a930>] xen_set_pte+0x20/0xe0
Signed-off-by: NAndy Lutomirski <luto@mit.edu>
Link: http://lkml.kernel.org/r/4659478ed2f3480938f96491c2ecbe2b2e113a23.1312378163.git.luto@mit.eduReviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

5d5791af

30 6月, 2011 1 次提交

xen/mmu: Fix for linker errors when CONFIG_SMP is not defined. · 32dd1194

由 Konrad Rzeszutek Wilk 提交于 6月 30, 2011

Simple enough - we use an extern defined symbol which is not
defined when CONFIG_SMP is not defined. This fixes the linker
dying.

CC: stable@kernel.org
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

32dd1194

17 6月, 2011 1 次提交

xen/setup: Fix for incorrect xen_extra_mem_start. · acd049c6

由 Konrad Rzeszutek Wilk 提交于 6月 16, 2011

The earlier attempts (24bdb0b6)
at fixing this problem caused other problems to surface (PV guests
with no PCI passthrough would have SWIOTLB turned on - which meant
64MB of precious contingous DMA32 memory being eaten up per guest).
The problem was: "on xen we add an extra memory region at the end of
the e820, and on this particular machine this extra memory region
would start below 4g and cross over the 4g boundary:

[0xfee01000-0x192655000)

Unfortunately e820_end_of_low_ram_pfn does not expect an
e820 layout like that so it returns 4g, therefore initial_memory_mapping
will map [0 - 0x100000000), that is a memory range that includes some
reserved memory regions."

The memory range was the IOAPIC regions, and with the 1-1 mapping
turned on, it would map them as RAM, not as MMIO regions. This caused
the hypervisor to complain. Fortunately this is experienced only under
the initial domain so we guard for it.
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

acd049c6

16 6月, 2011 2 次提交

xen: When calling power_off, don't call the halt function. · b2abe506

由 Tom Goetz 提交于 5月 16, 2011

.. As it won't actually power off the machine.
Reported-by: NSven Köhler <sven.koehler@gmail.com>
Tested-by: NSven Köhler <sven.koehler@gmail.com>
Signed-off-by: NTom Goetz <tom.goetz@virtualcomputer.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

b2abe506

xen: support CONFIG_MAXSMP · 900cba88

由 Andrew Jones 提交于 12月 18, 2009

The MAXSMP config option requires CPUMASK_OFFSTACK, which in turn
requires we init the memory for the maps while we bring up the cpus.
MAXSMP also increases NR_CPUS to 4096. This increase in size exposed an
issue in the argument construction for multicalls from
xen_flush_tlb_others. The args should only need space for the actual
number of cpus.

Also in 2.6.39 it exposes a bootup problem.

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffff8157a1d3>] set_cpu_sibling_map+0x123/0x30d
...
Call Trace:
[<ffffffff81039a3f>] ? xen_restore_fl_direct_reloc+0x4/0x4
[<ffffffff819dc4db>] xen_smp_prepare_cpus+0x36/0x135
..

CC: stable@kernel.org
Signed-off-by: NAndrew Jones <drjones@redhat.com>
[v2: Updated to compile on 3.0]
[v3: Updated to compile when CONFIG_SMP is not defined]
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

900cba88

09 6月, 2011 1 次提交

xen: partially revert "xen: set max_pfn_mapped to the last pfn mapped" · a91d9287

由 Stefano Stabellini 提交于 6月 03, 2011

We only need to set max_pfn_mapped to the last pfn mapped on x86_64 to
make sure that cleanup_highmap doesn't remove important mappings at
_end.

We don't need to do this on x86_32 because cleanup_highmap is not called
on x86_32. Besides lowering max_pfn_mapped on x86_32 has the unwanted
side effect of limiting the amount of memory available for the 1:1
kernel pagetable allocation.

This patch reverts the x86_32 part of the original patch.

CC: stable@kernel.org
Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

a91d9287

04 6月, 2011 1 次提交

xen: off by one errors in multicalls.c · f124c6ae

由 Dan Carpenter 提交于 6月 03, 2011

b->args[] has MC_ARGS elements, so the comparison here should be
">=" instead of ">".  Otherwise we read past the end of the array
one space.

CC: stable@kernel.org
Signed-off-by: NDan Carpenter <error27@gmail.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

f124c6ae

21 5月, 2011 9 次提交

J
xen: fix compile without CONFIG_XEN_DEBUG_FS · 4bf0ff24
由 Jeremy Fitzhardinge 提交于 5月 20, 2011
```
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
```
4bf0ff24
J
Use arbitrary_virt_to_machine() to deal with ioremapped pud updates. · 2a001f64
由 Jeremy Fitzhardinge 提交于 4月 06, 2011
```
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
```
2a001f64
J
Use arbitrary_virt_to_machine() to deal with ioremapped pmd updates. · f05608d2
由 Jeremy Fitzhardinge 提交于 3月 24, 2011
```
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
```
f05608d2
J
xen/mmu: remove all ad-hoc stats stuff · c86d8077
由 Jeremy Fitzhardinge 提交于 12月 16, 2010
```
To make way for tracing.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
```
c86d8077

xen: use normal virt_to_machine for ptes · d5108316

由 Jeremy Fitzhardinge 提交于 12月 22, 2010

We no longer support HIGHPTE allocations, so ptes should always be
within the kernel's direct map, and don't need pagetable walks
to convert to machine addresses.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

d5108316

J
xen: make a pile of mmu pvop functions static · 4c13629f
由 Jeremy Fitzhardinge 提交于 12月 01, 2010
```
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
```
4c13629f

xen: condense everything onto xen_set_pte · 4a35c13c

由 Jeremy Fitzhardinge 提交于 12月 01, 2010

xen_set_pte_at and xen_clear_pte are essentially identical to
xen_set_pte, so just make them all common.

When batched set_pte and pte_clear are the same, but the unbatch operation
must be different: they need to update the two halves of the pte in
different order.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

4a35c13c

xen: use mmu_update for xen_set_pte_at() · a99ac5e8

由 Jeremy Fitzhardinge 提交于 12月 01, 2010

In principle update_va_mapping is a good match for set_pte_at, since
it gets the address being mapped, which allows Xen to use its linear
pagetable mapping.

However that assumes that the pmd for the address is attached to the
current pagetable, which may not be true for a given user address space
because the kernel pmd is not shared (at least on 32-bit guests).
Normally the kernel will automatically sync a missing part of the
pagetable with the init_mm pagetable transparently via faults, but that
fails when a missing address is passed to Xen.

And while the linear pagetable mapping is very useful for 32-bit Xen
(as it avoids an explicit domain mapping), 32-bit Xen is deprecated.
64-bit Xen has all memory mapped all the time, so it makes no real
difference.

The upshot is that we should use mmu_update, since it can operate on
non-current pagetables or detached pagetables.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

a99ac5e8

xen: drop all the special iomap pte paths. · 331468b1

由 Jeremy Fitzhardinge 提交于 12月 01, 2010

Xen can work out when we're doing IO mappings for itself, so we don't
need to do anything special, and the extra tests just clog things up.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

331468b1

19 5月, 2011 6 次提交

arch/x86/xen/smp: Cleanup code/data sections definitions · b53cedeb

由 Daniel Kiper 提交于 5月 04, 2011

Cleanup code/data sections definitions
accordingly to include/linux/init.h.
Signed-off-by: NDaniel Kiper <dkiper@net-space.pl>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

b53cedeb

arch/x86/xen/time: Cleanup code/data sections definitions · fb6ce5de

由 Daniel Kiper 提交于 5月 04, 2011

Cleanup code/data sections definitions
accordingly to include/linux/init.h.
Signed-off-by: NDaniel Kiper <dkiper@net-space.pl>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

fb6ce5de

arch/x86/xen/xen-ops: Cleanup code/data sections definitions · ad7ba09e

由 Daniel Kiper 提交于 5月 04, 2011

Cleanup code/data sections definitions
accordingly to include/linux/init.h.
Signed-off-by: NDaniel Kiper <dkiper@net-space.pl>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

ad7ba09e

arch/x86/xen/mmu: Cleanup code/data sections definitions · 3f508953

由 Daniel Kiper 提交于 5月 12, 2011

Cleanup code/data sections definitions
accordingly to include/linux/init.h.
Signed-off-by: NDaniel Kiper <dkiper@net-space.pl>
[v1: Rebased on top of latest linus's to include fixes in mmu.c]
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

3f508953

xen/p2m: Add EXPORT_SYMBOL_GPL to the M2P override functions. · c9ce9e43

由 Konrad Rzeszutek Wilk 提交于 4月 20, 2011

If the backends, which use these two functions, are compiled as
a module we need these two functions to be exported.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

c9ce9e43

xen/p2m/m2p/gnttab: Support GNTMAP_host_map in the M2P override. · d5431d52

由 Konrad Rzeszutek Wilk 提交于 2月 28, 2011

We only supported the M2P (and P2M) override only for the
GNTMAP_contains_pte type mappings. Meaning that we grants
operations would "contain the machine address of the PTE to update"
If the flag is unset, then the grant operation is
"contains a host virtual address". The latter case means that
the Hypervisor takes care of updating our page table
(specifically the PTE entry) with the guest's MFN. As such we should
not try to do anything with the PTE. Previous to this patch
we would try to clear the PTE which resulted in Xen hypervisor
being upset with us:

(XEN) mm.c:1066:d0 Attempt to implicitly unmap a granted PTE c0100000ccc59067
(XEN) domain_crash called from mm.c:1067
(XEN) Domain 0 (vcpu#0) crashed on cpu#3:
(XEN) ----[ Xen-4.0-110228  x86_64  debug=y  Not tainted ]----

and crashing us.

This patch allows us to inhibit the PTE clearing in the PV guest
if the GNTMAP_contains_pte is not set.

On the m2p_remove_override path we provide the same parameter.

Sadly in the grant-table driver we do not have a mechanism to
tell m2p_remove_override whether to clear the PTE or not. Since
the grant-table driver is used by user-space, we can safely assume
that it operates only on PTE's. Hence the implementation for
it to work on !GNTMAP_contains_pte returns -EOPNOTSUPP. In the future
we can implement the support for this. It will require some extra
accounting structure to keep track of the page[i], and the flag.

[v1: Added documentation details, made it return -EOPNOTSUPP instead
 of trying to do a half-way implementation]
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

d5431d52

13 5月, 2011 9 次提交

arch/x86/xen/setup: Cleanup code/data sections definitions · ae15a3b4

由 Daniel Kiper 提交于 5月 04, 2011

Cleanup code/data sections definitions
accordingly to include/linux/init.h.
Signed-off-by: NDaniel Kiper <dkiper@net-space.pl>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

ae15a3b4

arch/x86/xen/enlighten: Cleanup code/data sections definitions · ad3062a0

由 Daniel Kiper 提交于 5月 04, 2011

Cleanup code/data sections definitions
accordingly to include/linux/init.h.
Signed-off-by: NDaniel Kiper <dkiper@net-space.pl>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

ad3062a0

arch/x86/xen/irq: Cleanup code/data sections definitions · 251511a1

由 Daniel Kiper 提交于 5月 04, 2011

Cleanup code/data sections definitions
accordingly to include/linux/init.h.
Signed-off-by: NDaniel Kiper <dkiper@net-space.pl>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

251511a1

xen/p2m: Create entries in the P2M_MFN trees's to track 1-1 mappings · 8c595088

由 Konrad Rzeszutek Wilk 提交于 4月 01, 2011

.. when applicable. We need to track in the p2m_mfn and
p2m_mfn_p the MFNs and pointers, respectivly, for the P2M entries
that are allocated for the identity mappings. Without this,
a PV domain with an E820 that triggers the 1-1 mapping to kick in,
won't be able to be restored as the P2M won't have the identity
mappings.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

8c595088

xen/setup: Fix for incorrect xen_extra_mem_start initialization under 32-bit · 0f16d0df

由 Daniel Kiper 提交于 5月 11, 2011

git commit 24bdb0b6 (xen: do not create
the extra e820 region at an addr lower than 4G) does not take into
account that ifdef CONFIG_X86_32 instead of e820_end_of_low_ram_pfn()
find_low_pfn_range() is called (both calls are from arch/x86/kernel/setup.c).
find_low_pfn_range() behaves correctly and does not require change in
xen_extra_mem_start initialization. Additionally, if xen_extra_mem_start
is initialized in the same way as ifdef CONFIG_X86_64 then memory hotplug
support for Xen balloon driver (under development) is broken.
Signed-off-by: NDaniel Kiper <dkiper@net-space.pl>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

0f16d0df

xen/setup: Ignore E820_UNUSABLE when setting 1-1 mappings. · 15bfc094

由 Konrad Rzeszutek Wilk 提交于 4月 12, 2011

When we parse the raw E820, the Xen hypervisor can set "E820_RAM"
to "E820_UNUSABLE" if the mem=X argument is used. As such we
should _not_ consider the E820_UNUSABLE as an 1-1 identity
mapping, but instead use the same case as for E820_RAM.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

15bfc094

xen mmu: fix a race window causing leave_mm BUG() · 7899891c

由 Tian, Kevin 提交于 5月 12, 2011

There's a race window in xen_drop_mm_ref, where remote cpu may exit
dirty bitmap between the check on this cpu and the point where remote
cpu handles drop request. So in drop_other_mm_ref we need check
whether TLB state is still lazy before calling into leave_mm. This
bug is rarely observed in earlier kernel, but exaggerated by the
commit 831d52bc
("x86, mm: avoid possible bogus tlb entries by clearing prev mm_cpumask after switching mm")
which clears bitmap after changing the TLB state. the call trace is as below:

---------------------------------
kernel BUG at arch/x86/mm/tlb.c:61!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/system/xen_memory/xen_memory0/info/current_kb
CPU 1
Modules linked in: 8021q garp xen_netback xen_blkback blktap blkback_pagemap nbd bridge stp llc autofs4 ipmi_devintf ipmi_si ipmi_msghandler lockd sunrpc bonding ipv6 xenfs dm_multipath video output sbs sbshc parport_pc lp parport ses enclosure snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device serio_raw bnx2 snd_pcm_oss snd_mixer_oss snd_pcm snd_timer iTCO_wdt snd soundcore snd_page_alloc i2c_i801 iTCO_vendor_support i2c_core pcs pkr pata_acpi ata_generic ata_piix shpchp mptsas mptscsih mptbase [last unloaded: freq_table]
Pid: 25581, comm: khelper Not tainted 2.6.32.36fixxen #1 Tecal RH2285
RIP: e030:[<ffffffff8103a3cb>]  [<ffffffff8103a3cb>] leave_mm+0x15/0x46
RSP: e02b:ffff88002805be48  EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff88015f8e2da0
RDX: ffff88002805be78 RSI: 0000000000000000 RDI: 0000000000000001
RBP: ffff88002805be48 R08: ffff88009d662000 R09: dead000000200200
R10: dead000000100100 R11: ffffffff814472b2 R12: ffff88009bfc1880
R13: ffff880028063020 R14: 00000000000004f6 R15: 0000000000000000
FS:  00007f62362d66e0(0000) GS:ffff880028058000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003aabc11909 CR3: 000000009b8ca000 CR4: 0000000000002660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 00000000000000 00
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process khelper (pid: 25581, threadinfo ffff88007691e000, task ffff88009b92db40)
Stack:
 ffff88002805be68 ffffffff8100e4ae 0000000000000001 ffff88009d733b88
<0> ffff88002805be98 ffffffff81087224 ffff88002805be78 ffff88002805be78
<0> ffff88015f808360 00000000000004f6 ffff88002805bea8 ffffffff81010108
Call Trace:
 <IRQ>
 [<ffffffff8100e4ae>] drop_other_mm_ref+0x2a/0x53
 [<ffffffff81087224>] generic_smp_call_function_single_interrupt+0xd8/0xfc
 [<ffffffff81010108>] xen_call_function_single_interrupt+0x13/0x28
 [<ffffffff810a936a>] handle_IRQ_event+0x66/0x120
 [<ffffffff810aac5b>] handle_percpu_irq+0x41/0x6e
 [<ffffffff8128c1c0>] __xen_evtchn_do_upcall+0x1ab/0x27d
 [<ffffffff8128dd11>] xen_evtchn_do_upcall+0x33/0x46
 [<ffffffff81013efe>] xen_do_hyper visor_callback+0x1e/0x30
 <EOI>
 [<ffffffff814472b2>] ? _spin_unlock_irqrestore+0x15/0x17
 [<ffffffff8100f8cf>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff81113f71>] ? flush_old_exec+0x3ac/0x500
 [<ffffffff81150dc5>] ? load_elf_binary+0x0/0x17ef
 [<ffffffff81150dc5>] ? load_elf_binary+0x0/0x17ef
 [<ffffffff8115115d>] ? load_elf_binary+0x398/0x17ef
 [<ffffffff81042fcf>] ? need_resched+0x23/0x2d
 [<ffffffff811f4648>] ? process_measurement+0xc0/0xd7
 [<ffffffff81150dc5>] ? load_elf_binary+0x0/0x17ef
 [<ffffffff81113094>] ? search_binary_handler+0xc8/0x255
 [<ffffffff81114362>] ? do_execve+0x1c3/0x29e
 [<ffffffff8101155d>] ? sys_execve+0x43/0x5d
 [<ffffffff8106fc45>] ? __call_usermodehelper+0x0/0x6f
 [<ffffffff81013e28>] ? kernel_execve+0x68/0xd0
 [<ffffffff 8106fc45>] ? __call_usermodehelper+0x0/0x6f
 [<ffffffff8100f8cf>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff8106fb64>] ? ____call_usermodehelper+0x113/0x11e
 [<ffffffff81013daa>] ? child_rip+0xa/0x20
 [<ffffffff8106fc45>] ? __call_usermodehelper+0x0/0x6f
 [<ffffffff81012f91>] ? int_ret_from_sys_call+0x7/0x1b
 [<ffffffff8101371d>] ? retint_restore_args+0x5/0x6
 [<ffffffff81013da0>] ? child_rip+0x0/0x20
Code: 41 5e 41 5f c9 c3 55 48 89 e5 0f 1f 44 00 00 e8 17 ff ff ff c9 c3 55 48 89 e5 0f 1f 44 00 00 65 8b 04 25 c8 55 01 00 ff c8 75 04 <0f> 0b eb fe 65 48 8b 34 25 c0 55 01 00 48 81 c6 b8 02 00 00 e8
RIP  [<ffffffff8103a3cb>] leave_mm+0x15/0x46
 RSP <ffff88002805be48>
---[ end trace ce9cee6832a9c503 ]---

Tested-by: Maoxiaoyun<tinnycloud@hotmail.com>
Signed-off-by: NKevin Tian <kevin.tian@intel.com>
[v1: Fleshed out the git description a bit]
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

7899891c

x86,xen: introduce x86_init.mapping.pagetable_reserve · 279b706b

由 Stefano Stabellini 提交于 4月 14, 2011

Introduce a new x86_init hook called pagetable_reserve that at the end
of init_memory_mapping is used to reserve a range of memory addresses for
the kernel pagetable pages we used and free the other ones.

On native it just calls memblock_x86_reserve_range while on xen it also
takes care of setting the spare memory previously allocated
for kernel pagetable pages from RO to RW, so that it can be used for
other purposes.

A detailed explanation of the reason why this hook is needed follows.

As a consequence of the commit:

commit 4b239f45
Author: Yinghai Lu <yinghai@kernel.org>
Date:   Fri Dec 17 16:58:28 2010 -0800

    x86-64, mm: Put early page table high

at some point init_memory_mapping is going to reach the pagetable pages
area and map those pages too (mapping them as normal memory that falls
in the range of addresses passed to init_memory_mapping as argument).
Some of those pages are already pagetable pages (they are in the range
pgt_buf_start-pgt_buf_end) therefore they are going to be mapped RO and
everything is fine.
Some of these pages are not pagetable pages yet (they fall in the range
pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they
are going to be mapped RW.  When these pages become pagetable pages and
are hooked into the pagetable, xen will find that the guest has already
a RW mapping of them somewhere and fail the operation.
The reason Xen requires pagetables to be RO is that the hypervisor needs
to verify that the pagetables are valid before using them. The validation
operations are called "pinning" (more details in arch/x86/xen/mmu.c).

In order to fix the issue we mark all the pages in the entire range
pgt_buf_start-pgt_buf_top as RO, however when the pagetable allocation
is completed only the range pgt_buf_start-pgt_buf_end is reserved by
init_memory_mapping. Hence the kernel is going to crash as soon as one
of the pages in the range pgt_buf_end-pgt_buf_top is reused (b/c those
ranges are RO).

For this reason we need a hook to reserve the kernel pagetable pages we
used and free the other ones so that they can be reused for other
purposes.
On native it just means calling memblock_x86_reserve_range, on Xen it
also means marking RW the pagetable pages that we allocated before but
that haven't been used before.

Another way to fix this is without using the hook is by adding a 'if
(xen_pv_domain)' in the 'init_memory_mapping' code and calling the Xen
counterpart, but that is just nasty.
Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: NYinghai Lu <yinghai@kernel.org>
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

279b706b

Revert "xen/mmu: Add workaround "x86-64, mm: Put early page table high"" · 92bdaef7

由 Konrad Rzeszutek Wilk 提交于 5月 05, 2011

This reverts commit a3864783.

It does not work with certain AMD machines.

last_pfn = 0x100000 max_arch_pfn = 0x400000000
initial memory mapped : 0 - 02c3a000
Base memory trampoline at [ffff88000009b000] 9b000 size 20480
init_memory_mapping: 0000000000000000-0000000100000000
 0000000000 - 0100000000 page 4k
kernel direct mapping tables up to 100000000 @ ff7fb000-100000000
init_memory_mapping: 0000000100000000-00000001e0800000
 0100000000 - 01e0800000 page 4k
kernel direct mapping tables up to 1e0800000 @ 1df0f3000-1e0000000
xen: setting RW the range fffdc000 - 100000000
RAMDISK: 0203b000 - 02c3a000
No NUMA configuration found
Faking a node at 0000000000000000-00000001e0800000
NUMA: Using 63 for the hash shift.
Initmem setup node 0 0000000000000000-00000001e0800000
  NODE_DATA [00000001dfffb000 - 00000001dfffffff]
BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffff81cf6a75>] setup_node_bootmem+0x18a/0x1ea
PGD 0
Oops: 0003 [#1] SMP
last sysfs file:
CPU 0
Modules linked in:

Pid: 0, comm: swapper Not tainted 2.6.39-0-virtual #6~smb1
RIP: e030:[<ffffffff81cf6a75>]  [<ffffffff81cf6a75>] setup_node_bootmem+0x18a/0x1ea
RSP: e02b:ffffffff81c01e38  EFLAGS: 00010046
RAX: 0000000000000000 RBX: 00000001e0800000 RCX: 0000000000001040
RDX: 0000000000004100 RSI: 0000000000000000 RDI: ffff8801dfffb000
RBP: ffffffff81c01e58 R08: 0000000000000020 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000bfe400
FS:  0000000000000000(0000) GS:ffffffff81cca000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000001c03000 CR4: 0000000000000660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff81c00000, task ffffffff81c0b020)
Stack:
 0000000000000040 0000000000000001 0000000000000000 ffffffffffffffff
 ffffffff81c01e88 ffffffff81cf6c25 0000000000000000 0000000000000000
 ffffffff81cf687f 0000000000000000 ffffffff81c01ea8 ffffffff81cf6e45
Call Trace:
 [<ffffffff81cf6c25>] numa_register_memblks.constprop.3+0x150/0x181
 [<ffffffff81cf687f>] ? numa_add_memblk+0x7c/0x7c
 [<ffffffff81cf6e45>] numa_init.part.2+0x1c/0x7c
 [<ffffffff81cf687f>] ? numa_add_memblk+0x7c/0x7c
 [<ffffffff81cf6f67>] numa_init+0x6c/0x70
 [<ffffffff81cf7057>] initmem_init+0x39/0x3b
 [<ffffffff81ce5865>] setup_arch+0x64e/0x769
 [<ffffffff815e43c1>] ? printk+0x51/0x53
 [<ffffffff81cdf92b>] start_kernel+0xd4/0x3f3
 [<ffffffff81cdf388>] x86_64_start_reservations+0x132/0x136
 [<ffffffff81ce2ed4>] xen_start_kernel+0x588/0x58f
Code: 41 00 00 48 8b 3c c5 a0 24 cc 81 31 c0 40 f6 c7 01 74 05 aa 66 ba ff 40 40 f6 c7 02 74 05 66 ab 83 ea 02 89 d1 c1 e9 02 f6 c2 02 <f3> ab 74 02 66 ab 80 e2 01 74 01 aa 49 63 c4 48 c1 eb 0c 44 89
RIP  [<ffffffff81cf6a75>] setup_node_bootmem+0x18a/0x1ea
 RSP <ffffffff81c01e38>
CR2: 0000000000000000
---[ end trace a7919e7f17c0a725 ]---
Kernel panic - not syncing: Attempted to kill the idle task!
Pid: 0, comm: swapper Tainted: G      D     2.6.39-0-virtual #6~smb1
Reported-by: NStefan Bader <stefan.bader@canonical.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

92bdaef7

10 5月, 2011 1 次提交

treewide: fix a few typos in comments · 70f23fd6

由 Justin P. Mattock 提交于 5月 10, 2011

- kenrel -> kernel
- whetehr -> whether
- ttt -> tt
- sss -> ss
Signed-off-by: NJustin P. Mattock <justinmattock@gmail.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

70f23fd6

03 5月, 2011 2 次提交

xen: mask_rw_pte mark RO all pagetable pages up to pgt_buf_top · b9269dc7

由 Stefano Stabellini 提交于 4月 12, 2011

mask_rw_pte is currently checking if a pfn is a pagetable page if it
falls in the range pgt_buf_start - pgt_buf_end but that is incorrect
because pgt_buf_end is a moving target: pgt_buf_top is the real
boundary.
Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

b9269dc7

xen/mmu: Add workaround "x86-64, mm: Put early page table high" · a3864783

由 Konrad Rzeszutek Wilk 提交于 4月 29, 2011

As a consequence of the commit:

commit 4b239f45
Author: Yinghai Lu <yinghai@kernel.org>
Date:   Fri Dec 17 16:58:28 2010 -0800

    x86-64, mm: Put early page table high

it causes the Linux kernel to crash under Xen:

mapping kernel into physical memory
Xen: setup ISA identity maps
about to get started...
(XEN) mm.c:2466:d0 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn b1d89 (pfn bacf7)
(XEN) mm.c:3027:d0 Error while pinning mfn b1d89
(XEN) traps.c:481:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
...

The reason is that at some point init_memory_mapping is going to reach
the pagetable pages area and map those pages too (mapping them as normal
memory that falls in the range of addresses passed to init_memory_mapping
as argument). Some of those pages are already pagetable pages (they are
in the range pgt_buf_start-pgt_buf_end) therefore they are going to be
mapped RO and everything is fine.
Some of these pages are not pagetable pages yet (they fall in the range
pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they
are going to be mapped RW.  When these pages become pagetable pages and
are hooked into the pagetable, xen will find that the guest has already
a RW mapping of them somewhere and fail the operation.
The reason Xen requires pagetables to be RO is that the hypervisor needs
to verify that the pagetables are valid before using them. The validation
operations are called "pinning" (more details in arch/x86/xen/mmu.c).

In order to fix the issue we mark all the pages in the entire range
pgt_buf_start-pgt_buf_top as RO, however when the pagetable allocation
is completed only the range pgt_buf_start-pgt_buf_end is reserved by
init_memory_mapping. Hence the kernel is going to crash as soon as one
of the pages in the range pgt_buf_end-pgt_buf_top is reused (b/c those
ranges are RO).

For this reason, this function is introduced which is called _after_
the init_memory_mapping has completed (in a perfect world we would
call this function from init_memory_mapping, but lets ignore that).

Because we are called _after_ init_memory_mapping the pgt_buf_[start,
end,top] have all changed to new values (b/c another init_memory_mapping
is called). Hence, the first time we enter this function, we save
away the pgt_buf_start value and update the pgt_buf_[end,top].

When we detect that the "old" pgt_buf_start through pgt_buf_end
PFNs have been reserved (so memblock_x86_reserve_range has been called),
we immediately set out to RW the "old" pgt_buf_end through pgt_buf_top.

And then we update those "old" pgt_buf_[end|top] with the new ones
so that we can redo this on the next pagetable.
Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
Reviewed-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
[v1: Updated with Jeremy's comments]
[v2: Added the crash output]
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

a3864783

20 4月, 2011 3 次提交

xen/p2m: Add EXPORT_SYMBOL_GPL to the M2P override functions. · 8a91707d

由 Konrad Rzeszutek Wilk 提交于 4月 20, 2011

If the backends, which use these two functions, are compiled as
a module we need these two functions to be exported.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

8a91707d

xen: mask_rw_pte: do not apply the early_ioremap checks on x86_32 · ee176455

由 Stefano Stabellini 提交于 4月 19, 2011

The two "is_early_ioremap_ptep" checks in mask_rw_pte are only used on
x86_64, in fact early_ioremap is not used at all to setup the initial
pagetable on x86_32.
Moreover on x86_32 the two checks are wrong because the range
pgt_buf_start..pgt_buf_end initially should be mapped RW because
the pages in the range are not pagetable pages yet and haven't been
cleared yet. Afterwards considering the pgt_buf_start..pgt_buf_end is
part of the initial mapping, xen_alloc_pte is capable of turning
the ptes RO when they become pagetable pages.

Fix the issue and improve the readability of the code providing two
different implementation of mask_rw_pte for x86_32 and x86_64.
Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

ee176455

xen: do not create the extra e820 region at an addr lower than 4G · 24bdb0b6

由 Stefano Stabellini 提交于 4月 12, 2011

Do not add the extra e820 region at a physical address lower than 4G
because it breaks e820_end_of_low_ram_pfn().

It is OK for us to move the xen_extra_mem_start up and down because this
is the index of the memory that can be ballooned in/out - it is memory
not available to the kernel during bootup.
Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

24bdb0b6

18 4月, 2011 1 次提交

xen/p2m/m2p/gnttab: Support GNTMAP_host_map in the M2P override. · cf8d9163

由 Konrad Rzeszutek Wilk 提交于 2月 28, 2011

We only supported the M2P (and P2M) override only for the
GNTMAP_contains_pte type mappings. Meaning that we grants
operations would "contain the machine address of the PTE to update"
If the flag is unset, then the grant operation is
"contains a host virtual address". The latter case means that
the Hypervisor takes care of updating our page table
(specifically the PTE entry) with the guest's MFN. As such we should
not try to do anything with the PTE. Previous to this patch
we would try to clear the PTE which resulted in Xen hypervisor
being upset with us:

(XEN) mm.c:1066:d0 Attempt to implicitly unmap a granted PTE c0100000ccc59067
(XEN) domain_crash called from mm.c:1067
(XEN) Domain 0 (vcpu#0) crashed on cpu#3:
(XEN) ----[ Xen-4.0-110228  x86_64  debug=y  Not tainted ]----

and crashing us.

This patch allows us to inhibit the PTE clearing in the PV guest
if the GNTMAP_contains_pte is not set.

On the m2p_remove_override path we provide the same parameter.

Sadly in the grant-table driver we do not have a mechanism to
tell m2p_remove_override whether to clear the PTE or not. Since
the grant-table driver is used by user-space, we can safely assume
that it operates only on PTE's. Hence the implementation for
it to work on !GNTMAP_contains_pte returns -EOPNOTSUPP. In the future
we can implement the support for this. It will require some extra
accounting structure to keep track of the page[i], and the flag.

[v1: Added documentation details, made it return -EOPNOTSUPP instead
 of trying to do a half-way implementation]
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

cf8d9163

14 4月, 2011 1 次提交

sched: Provide scheduler_ipi() callback in response to smp_send_reschedule() · 184748cc

由 Peter Zijlstra 提交于 4月 05, 2011

For future rework of try_to_wake_up() we'd like to push part of that
function onto the CPU the task is actually going to run on.

In order to do so we need a generic callback from the existing scheduler IPI.

This patch introduces such a generic callback: scheduler_ipi() and
implements it as a NOP.

BenH notes: PowerPC might use this IPI on offline CPUs under rare conditions!
Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk>
Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: NChris Metcalf <cmetcalf@tilera.com>
Acked-by: NJesper Nilsson <jesper.nilsson@axis.com>
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
Reviewed-by: NFrank Rowand <frank.rowand@am.sony.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110405152728.744338123@chello.nl

184748cc

12 4月, 2011 1 次提交

fix XEN_SAVE_RESTORE Kconfig dependencies · d419e4c0

由 Shriram Rajagopalan 提交于 4月 11, 2011

Make XEN_SAVE_RESTORE select HIBERNATE_CALLBACKS.
Remove XEN_SAVE_RESTORE dependency from PM_SLEEP.
Signed-off-by: NShriram Rajagopalan <rshriram@cs.ubc.ca>
Acked-by: NIan Campbell <ian.campbell@citrix.com>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

d419e4c0

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功