提交 · 34b6f01a79bd65fbd06511d2cb7b28e33a506246 · openeuler / raspberrypi-kernel

04 10月, 2012 1 次提交

xen pv-on-hvm: add pfn_is_ram helper for kdump · 34b6f01a

由 Olaf Hering 提交于 10月 01, 2012

Register pfn_is_ram helper speed up reading /proc/vmcore in the kdump
kernel. See commit message of 997c136f ("fs/proc/vmcore.c: add hook
to read_from_oldmem() to check for non-ram pages") for details.

It makes use of a new hvmop HVMOP_get_mem_type which was introduced in
xen 4.2 (23298:26413986e6e0) and backported to 4.1.1.

The new function is currently only enabled for reading /proc/vmcore.
Later it will be used also for the kexec kernel. Since that requires
more changes in the generic kernel make it static for the time being.
Signed-off-by: NOlaf Hering <olaf@aepfle.de>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

34b6f01a

24 9月, 2012 2 次提交

xen/vga: add the xen EFI video mode support · aa387d63

由 Jan Beulich 提交于 2月 09, 2012

In order to add xen EFI frambebuffer video support, it is required to add
xen-efi's new video type (XEN_VGATYPE_EFI_LFB) case and handle it in the
function xen_init_vga and set the video type to VIDEO_TYPE_EFI to enable
efi video mode.

The original patch from which this was broken out from:
 http://marc.info/?i=4E099AA6020000780004A4C6@nat28.tlf.novell.comSigned-off-by: NJan Beulich <JBeulich@novell.com>
Signed-off-by: NTang Liang <liang.tang@oracle.com>
[v2: The original author is Jan Beulich and Liang Tang ported it to upstream]
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

aa387d63

xen/x86: retrieve keyboard shift status flags from hypervisor. · ffb8b233

由 Konrad Rzeszutek Wilk 提交于 9月 21, 2012

The xen c/s 25873 allows the hypervisor to retrieve the NUMLOCK flag.
With this patch, the Linux kernel can get the state according to the
data in the BIOS.
Acked-by: NJan Beulich <jbeulich@suse.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

ffb8b233

18 9月, 2012 2 次提交

xen/swiotlb: Fix compile warnings when using plain integer instead of NULL pointer. · 2a3bce8f

由 Konrad Rzeszutek Wilk 提交于 8月 13, 2012

arch/x86/xen/pci-swiotlb-xen.c:96:1: warning: Using plain integer as NULL pointer
arch/x86/xen/pci-swiotlb-xen.c:96:1: warning: Using plain integer as NULL pointer
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

2a3bce8f

xen/swiotlb: Use the swiotlb_late_init_with_tbl to init Xen-SWIOTLB late when PV PCI is used. · b8277600

由 Konrad Rzeszutek Wilk 提交于 8月 23, 2012

With this patch we provide the functionality to initialize the
Xen-SWIOTLB late in the bootup cycle - specifically for
Xen PCI-frontend. We still will work if the user had
supplied 'iommu=soft' on the Linux command line.

Note: We cannot depend on after_bootmem to automatically
determine whether this is early or not. This is because
when PCI IOMMUs are initialized it is after after_bootmem but
before a lot of "other" subsystems are initialized.

CC: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
[v1: Fix smatch warnings]
[v2: Added check for xen_swiotlb]
[v3: Rebased with new xen-swiotlb changes]
[v4: squashed xen/swiotlb: Depending on after_bootmem is not correct in]
Reviewed-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

b8277600

12 9月, 2012 5 次提交

x86: Document x86_init.paging.pagetable_init() · 64282278

由 Attilio Rao 提交于 8月 21, 2012

Signed-off-by: NAttilio Rao <attilio.rao@citrix.com>
Acked-by: <konrad.wilk@oracle.com>
Cc: <Ian.Campbell@citrix.com>
Cc: <Stefano.Stabellini@eu.citrix.com>
Cc: <xen-devel@lists.xensource.com>
Link: http://lkml.kernel.org/r/1345580561-8506-6-git-send-email-attilio.rao@citrix.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

64282278

x86: xen: Cleanup and remove x86_init.paging.pagetable_setup_done() · c7112887

由 Attilio Rao 提交于 8月 21, 2012

At this stage x86_init.paging.pagetable_setup_done is only used in the
XEN case. Move its content in the x86_init.paging.pagetable_init setup
function and remove the now unused x86_init.paging.pagetable_setup_done
remaining infrastructure.
Signed-off-by: NAttilio Rao <attilio.rao@citrix.com>
Acked-by: <konrad.wilk@oracle.com>
Cc: <Ian.Campbell@citrix.com>
Cc: <Stefano.Stabellini@eu.citrix.com>
Cc: <xen-devel@lists.xensource.com>
Link: http://lkml.kernel.org/r/1345580561-8506-5-git-send-email-attilio.rao@citrix.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

c7112887

x86: Move paging_init() call to x86_init.paging.pagetable_init() · 843b8ed2

由 Attilio Rao 提交于 8月 21, 2012

Move the paging_init() call to the platform specific pagetable_init()
function, so we can get rid of the extra pagetable_setup_done()
function pointer.
Signed-off-by: NAttilio Rao <attilio.rao@citrix.com>
Acked-by: <konrad.wilk@oracle.com>
Cc: <Ian.Campbell@citrix.com>
Cc: <Stefano.Stabellini@eu.citrix.com>
Cc: <xen-devel@lists.xensource.com>
Link: http://lkml.kernel.org/r/1345580561-8506-4-git-send-email-attilio.rao@citrix.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

843b8ed2

x86: Rename pagetable_setup_start() to pagetable_init() · 7737b215

由 Attilio Rao 提交于 8月 21, 2012

In preparation for unifying the pagetable_setup_start() and
pagetable_setup_done() setup functions, rename appropriately all the
infrastructure related to pagetable_setup_start().
Signed-off-by: NAttilio Rao <attilio.rao@citrix.com>
Ackedd-by: <konrad.wilk@oracle.com>
Cc: <Ian.Campbell@citrix.com>
Cc: <Stefano.Stabellini@eu.citrix.com>
Cc: <xen-devel@lists.xensource.com>
Link: http://lkml.kernel.org/r/1345580561-8506-3-git-send-email-attilio.rao@citrix.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

7737b215

x86: Remove base argument from x86_init.paging.pagetable_setup_start · 73090f89

由 Attilio Rao 提交于 8月 21, 2012

We either use swapper_pg_dir or the argument is unused. Preparatory
patch to simplify platform pagetable setup further.
Signed-off-by: NAttilio Rao <attilio.rao@citrix.com>
Ackedb-by: <konrad.wilk@oracle.com>
Cc: <Ian.Campbell@citrix.com>
Cc: <Stefano.Stabellini@eu.citrix.com>
Cc: <xen-devel@lists.xensource.com>
Link: http://lkml.kernel.org/r/1345580561-8506-2-git-send-email-attilio.rao@citrix.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

73090f89

06 9月, 2012 1 次提交

xen/mm: return more precise error from xen_remap_domain_range() · 69870a84

由 David Vrabel 提交于 8月 30, 2012

Callers of xen_remap_domain_range() need to know if the remap failed
because frame is currently paged out.  So they can retry the remap
later on.  Return -ENOENT in this case.

This assumes that the error codes returned by Xen are a subset of
those used by the kernel.  It is unclear if this is defined as part of
the hypercall ABI.
Acked-by: NAndres Lagar-Cavilla <andres@lagarcavilla.org>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

69870a84

05 9月, 2012 2 次提交

xen: fix logical error in tlb flushing · ce7184bd

由 Alex Shi 提交于 8月 24, 2012

While TLB_FLUSH_ALL gets passed as 'end' argument to
flush_tlb_others(), the Xen code was made to check its 'start'
parameter. That may give a incorrect op.cmd to MMUEXT_INVLPG_MULTI
instead of MMUEXT_TLB_FLUSH_MULTI. Then it causes some page can not
be flushed from TLB.

This patch fixed this issue.
Reported-by: NJan Beulich <jbeulich@suse.com>
Signed-off-by: NAlex Shi <alex.shi@intel.com>
Acked-by: NJan Beulich <jbeulich@suse.com>
Tested-by: NYongjie Ren <yongjie.ren@intel.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

ce7184bd

xen/p2m: Fix one-off error in checking the P2M tree directory. · 50e90041

由 Konrad Rzeszutek Wilk 提交于 9月 04, 2012

We would traverse the full P2M top directory (from 0->MAX_DOMAIN_PAGES
inclusive) when trying to figure out whether we can re-use some of the
P2M middle leafs.

Which meant that if the kernel was compiled with MAX_DOMAIN_PAGES=512
we would try to use the 512th entry. Fortunately for us the p2m_top_index
has a check for this:

 BUG_ON(pfn >= MAX_P2M_PFN);

which we hit and saw this:

(XEN) domain_crash_sync called from entry.S
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-4.1.2-OVM  x86_64  debug=n  Tainted:    C ]----
(XEN) CPU:    0
(XEN) RIP:    e033:[<ffffffff819cadeb>]
(XEN) RFLAGS: 0000000000000212   EM: 1   CONTEXT: pv guest
(XEN) rax: ffffffff81db5000   rbx: ffffffff81db4000   rcx: 0000000000000000
(XEN) rdx: 0000000000480211   rsi: 0000000000000000   rdi: ffffffff81db4000
(XEN) rbp: ffffffff81793db8   rsp: ffffffff81793d38   r8:  0000000008000000
(XEN) r9:  4000000000000000   r10: 0000000000000000   r11: ffffffff81db7000
(XEN) r12: 0000000000000ff8   r13: ffffffff81df1ff8   r14: ffffffff81db6000
(XEN) r15: 0000000000000ff8   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 0000000661795000   cr2: 0000000000000000

Fixes-Oracle-Bug: 14570662
CC: stable@vger.kernel.org # only for v3.5
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

50e90041

28 8月, 2012 1 次提交

KVM: x86: fix KVM_GET_MSR for PV EOI · 1d92128f

由 Michael S. Tsirkin 提交于 8月 26, 2012

KVM_GET_MSR was missing support for PV EOI,
which is needed for migration.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

1d92128f

23 8月, 2012 18 次提交

xen/mmu: If the revector fails, don't attempt to revector anything else. · 32873187

由 Konrad Rzeszutek Wilk 提交于 8月 17, 2012

If the P2M revectoring would fail, we would try to continue on by
cleaning the PMD for L1 (PTE) page-tables. The xen_cleanhighmap
is greedy and erases the PMD on both boundaries. Since the P2M
array can share the PMD, we would wipe out part of the __ka
that is still used in the P2M tree to point to P2M leafs.

This fixes it by bypassing the revectoring and continuing on.
If the revector fails, a nice WARN is printed so we can still
troubleshoot this.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

32873187

xen/p2m: When revectoring deal with holes in the P2M array. · 3fc509fc

由 Konrad Rzeszutek Wilk 提交于 8月 16, 2012

When we free the PFNs and then subsequently populate them back
during bootup:

Freeing 20000-20200 pfn range: 512 pages freed
1-1 mapping on 20000->20200
Freeing 40000-40200 pfn range: 512 pages freed
1-1 mapping on 40000->40200
Freeing bad80-badf4 pfn range: 116 pages freed
1-1 mapping on bad80->badf4
Freeing badf6-bae7f pfn range: 137 pages freed
1-1 mapping on badf6->bae7f
Freeing bb000-100000 pfn range: 282624 pages freed
1-1 mapping on bb000->100000
Released 283999 pages of unused memory
Set 283999 page(s) to 1-1 mapping
Populating 1acb8a-1f20e9 pfn range: 283999 pages added

We end up having the P2M array (that is the one that was
grafted on the P2M tree) filled with IDENTITY_FRAME or
INVALID_P2M_ENTRY) entries. The patch titled

"xen/p2m: Reuse existing P2M leafs if they are filled with 1:1 PFNs or INVALID."
recycles said slots and replaces the P2M tree leaf's with
 &mfn_list[xx] with p2m_identity or p2m_missing.

And re-uses the P2M array sections for other P2M tree leaf's.
For the above mentioned bootup excerpt, the PFNs at
0x20000->0x20200 are going to be IDENTITY based:

P2M[0][256][0] -> P2M[0][257][0] get turned in IDENTITY_FRAME.

We can re-use that and replace P2M[0][256] to point to p2m_identity.
The "old" page (the grafted P2M array provided by Xen) that was at
P2M[0][256] gets put somewhere else. Specifically at P2M[6][358],
b/c when we populate back:

Populating 1acb8a-1f20e9 pfn range: 283999 pages added

we fill P2M[6][358][0] (and P2M[6][358], P2M[6][359], ...) with
the new MFNs.

That is all OK, except when we revector we assume that the PFN
count would be the same in the grafted P2M array and in the
newly allocated. Since that is no longer the case, as we have
holes in the P2M that point to p2m_missing or p2m_identity we
have to take that into account.

[v2: Check for overflow]
[v3: Move within the __va check]
[v4: Fix the computation]
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

3fc509fc

xen/mmu: Release just the MFN list, not MFN list and part of pagetables. · 785f6231

由 Konrad Rzeszutek Wilk 提交于 8月 14, 2012

We call memblock_reserve for [start of mfn list] -> [PMD aligned end
of mfn list] instead of <start of mfn list> -> <page aligned end of mfn list].

This has the disastrous effect that if at bootup the end of mfn_list is
not PMD aligned we end up returning to memblock parts of the region
past the mfn_list array. And those parts are the PTE tables with
the disastrous effect of seeing this at bootup:

Write protecting the kernel read-only data: 10240k
Freeing unused kernel memory: 1860k freed
Freeing unused kernel memory: 200k freed
(XEN) mm.c:2429:d0 Bad type (saw 1400000000000002 != exp 7000000000000000) for mfn 116a80 (pfn 14e26)
...
(XEN) mm.c:908:d0 Error getting mfn 116a83 (pfn 14e2a) from L1 entry 8000000116a83067 for l1e_owner=0, pg_owner=0
(XEN) mm.c:908:d0 Error getting mfn 4040 (pfn 5555555555555555) from L1 entry 0000000004040601 for l1e_owner=0, pg_owner=0
.. and so on.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

785f6231

xen/mmu: Remove from __ka space PMD entries for pagetables. · 3aca7fbc

由 Konrad Rzeszutek Wilk 提交于 8月 14, 2012

Please first read the description in "xen/mmu: Copy and revector the
P2M tree."

At this stage, the __ka address space (which is what the old
P2M tree was using) is partially disassembled. The cleanup_highmap
has removed the PMD entries from 0-16MB and anything past _brk_end
up to the max_pfn_mapped (which is the end of the ramdisk).

The xen_remove_p2m_tree and code around has ripped out the __ka for
the old P2M array.

Here we continue on doing it to where the Xen page-tables were.
It is safe to do it, as the page-tables are addressed using __va.
For good measure we delete anything that is within MODULES_VADDR
and up to the end of the PMD.

At this point the __ka only contains PMD entries for the start
of the kernel up to __brk.

[v1: Per Stefano's suggestion wrapped the MODULES_VADDR in debug]
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

3aca7fbc

xen/mmu: Copy and revector the P2M tree. · 7f914062

由 Konrad Rzeszutek Wilk 提交于 7月 26, 2012

Please first read the description in "xen/p2m: Add logic to revector a
P2M tree to use __va leafs" patch.

The 'xen_revector_p2m_tree()' function allocates a new P2M tree
copies the contents of the old one in it, and returns the new one.

At this stage, the __ka address space (which is what the old
P2M tree was using) is partially disassembled. The cleanup_highmap
has removed the PMD entries from 0-16MB and anything past _brk_end
up to the max_pfn_mapped (which is the end of the ramdisk).

We have revectored the P2M tree (and the one for save/restore as well)
to use new shiny __va address to new MFNs. The xen_start_info
has been taken care of already in 'xen_setup_kernel_pagetable()' and
xen_start_info->shared_info in 'xen_setup_shared_info()', so
we are free to roam and delete PMD entries - which is exactly what
we are going to do. We rip out the __ka for the old P2M array.

[v1: Fix smatch warnings]
[v2: memset was doing 0 instead of 0xff]
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

7f914062

xen/p2m: Add logic to revector a P2M tree to use __va leafs. · 357a3cfb

由 Konrad Rzeszutek Wilk 提交于 7月 19, 2012

During bootup Xen supplies us with a P2M array. It sticks
it right after the ramdisk, as can be seen with a 128GB PV guest:

(certain parts removed for clarity):
xc_dom_build_image: called
xc_dom_alloc_segment:   kernel       : 0xffffffff81000000 -> 0xffffffff81e43000  (pfn 0x1000 + 0xe43 pages)
xc_dom_pfn_to_ptr: domU mapping: pfn 0x1000+0xe43 at 0x7f097d8bf000
xc_dom_alloc_segment:   ramdisk      : 0xffffffff81e43000 -> 0xffffffff925c7000  (pfn 0x1e43 + 0x10784 pages)
xc_dom_pfn_to_ptr: domU mapping: pfn 0x1e43+0x10784 at 0x7f0952dd2000
xc_dom_alloc_segment:   phys2mach    : 0xffffffff925c7000 -> 0xffffffffa25c7000  (pfn 0x125c7 + 0x10000 pages)
xc_dom_pfn_to_ptr: domU mapping: pfn 0x125c7+0x10000 at 0x7f0942dd2000
xc_dom_alloc_page   :   start info   : 0xffffffffa25c7000 (pfn 0x225c7)
xc_dom_alloc_page   :   xenstore     : 0xffffffffa25c8000 (pfn 0x225c8)
xc_dom_alloc_page   :   console      : 0xffffffffa25c9000 (pfn 0x225c9)
nr_page_tables: 0x0000ffffffffffff/48: 0xffff000000000000 -> 0xffffffffffffffff, 1 table(s)
nr_page_tables: 0x0000007fffffffff/39: 0xffffff8000000000 -> 0xffffffffffffffff, 1 table(s)
nr_page_tables: 0x000000003fffffff/30: 0xffffffff80000000 -> 0xffffffffbfffffff, 1 table(s)
nr_page_tables: 0x00000000001fffff/21: 0xffffffff80000000 -> 0xffffffffa27fffff, 276 table(s)
xc_dom_alloc_segment:   page tables  : 0xffffffffa25ca000 -> 0xffffffffa26e1000  (pfn 0x225ca + 0x117 pages)
xc_dom_pfn_to_ptr: domU mapping: pfn 0x225ca+0x117 at 0x7f097d7a8000
xc_dom_alloc_page   :   boot stack   : 0xffffffffa26e1000 (pfn 0x226e1)
xc_dom_build_image  : virt_alloc_end : 0xffffffffa26e2000
xc_dom_build_image  : virt_pgtab_end : 0xffffffffa2800000

So the physical memory and virtual (using __START_KERNEL_map addresses)
layout looks as so:

  phys                             __ka
/------------\                   /-------------------\
| 0          | empty             | 0xffffffff80000000|
| ..         |                   | ..                |
| 16MB       | <= kernel starts  | 0xffffffff81000000|
| ..         |                   |                   |
| 30MB       | <= kernel ends => | 0xffffffff81e43000|
| ..         |  & ramdisk starts | ..                |
| 293MB      | <= ramdisk ends=> | 0xffffffff925c7000|
| ..         |  & P2M starts     | ..                |
| ..         |                   | ..                |
| 549MB      | <= P2M ends    => | 0xffffffffa25c7000|
| ..         | start_info        | 0xffffffffa25c7000|
| ..         | xenstore          | 0xffffffffa25c8000|
| ..         | cosole            | 0xffffffffa25c9000|
| 549MB      | <= page tables => | 0xffffffffa25ca000|
| ..         |                   |                   |
| 550MB      | <= PGT end     => | 0xffffffffa26e1000|
| ..         | boot stack        |                   |
\------------/                   \-------------------/

As can be seen, the ramdisk, P2M and pagetables are taking
a bit of __ka addresses space. Which is a problem since the
MODULES_VADDR starts at 0xffffffffa0000000 - and P2M sits
right in there! This results during bootup with the inability to
load modules, with this error:

------------[ cut here ]------------
WARNING: at /home/konrad/ssd/linux/mm/vmalloc.c:106 vmap_page_range_noflush+0x2d9/0x370()
Call Trace:
 [<ffffffff810719fa>] warn_slowpath_common+0x7a/0xb0
 [<ffffffff81030279>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
 [<ffffffff81071a45>] warn_slowpath_null+0x15/0x20
 [<ffffffff81130b89>] vmap_page_range_noflush+0x2d9/0x370
 [<ffffffff81130c4d>] map_vm_area+0x2d/0x50
 [<ffffffff811326d0>] __vmalloc_node_range+0x160/0x250
 [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80
 [<ffffffff810c6186>] ? load_module+0x66/0x19c0
 [<ffffffff8105cadc>] module_alloc+0x5c/0x60
 [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80
 [<ffffffff810c5369>] module_alloc_update_bounds+0x19/0x80
 [<ffffffff810c70c3>] load_module+0xfa3/0x19c0
 [<ffffffff812491f6>] ? security_file_permission+0x86/0x90
 [<ffffffff810c7b3a>] sys_init_module+0x5a/0x220
 [<ffffffff815ce339>] system_call_fastpath+0x16/0x1b
---[ end trace fd8f7704fdea0291 ]---
vmalloc: allocation failure, allocated 16384 of 20480 bytes
modprobe: page allocation failure: order:0, mode:0xd2

Since the __va and __ka are 1:1 up to MODULES_VADDR and
cleanup_highmap rids __ka of the ramdisk mapping, what
we want to do is similar - get rid of the P2M in the __ka
address space. There are two ways of fixing this:

 1) All P2M lookups instead of using the __ka address would
    use the __va address. This means we can safely erase from
    __ka space the PMD pointers that point to the PFNs for
    P2M array and be OK.
 2). Allocate a new array, copy the existing P2M into it,
    revector the P2M tree to use that, and return the old
    P2M to the memory allocate. This has the advantage that
    it sets the stage for using XEN_ELF_NOTE_INIT_P2M
    feature. That feature allows us to set the exact virtual
    address space we want for the P2M - and allows us to
    boot as initial domain on large machines.

So we pick option 2).

This patch only lays the groundwork in the P2M code. The patch
that modifies the MMU is called "xen/mmu: Copy and revector the P2M tree."
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

357a3cfb

xen/mmu: Recycle the Xen provided L4, L3, and L2 pages · 488f046d

由 Konrad Rzeszutek Wilk 提交于 7月 26, 2012

As we are not using them. We end up only using the L1 pagetables
and grafting those to our page-tables.

[v1: Per Stefano's suggestion squashed two commits]
[v2: Per Stefano's suggestion simplified loop]
[v3: Fix smatch warnings]
[v4: Add more comments]
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

488f046d

xen/mmu: For 64-bit do not call xen_map_identity_early · caaf9ecf

由 Konrad Rzeszutek Wilk 提交于 7月 12, 2012

B/c we do not need it. During the startup the Xen provides
us with all the initial memory mapped that we need to function.

The initial memory mapped is up to the bootstack, which means
we can reference using __ka up to 4.f):

(from xen/interface/xen.h):

 4. This the order of bootstrap elements in the initial virtual region:
   a. relocated kernel image
   b. initial ram disk              [mod_start, mod_len]
   c. list of allocated page frames [mfn_list, nr_pages]
   d. start_info_t structure        [register ESI (x86)]
   e. bootstrap page tables         [pt_base, CR3 (x86)]
   f. bootstrap stack               [register ESP (x86)]

(initial ram disk may be ommitted).

[v1: More comments in git commit]
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

caaf9ecf

xen/mmu: use copy_page instead of memcpy. · ae895ed7

由 Konrad Rzeszutek Wilk 提交于 7月 26, 2012

After all, this is what it is there for.
Acked-by: NJan Beulich <jbeulich@suse.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

ae895ed7

xen/mmu: Provide comments describing the _ka and _va aliasing issue · 4fac153a

由 Konrad Rzeszutek Wilk 提交于 7月 12, 2012

Which is that the level2_kernel_pgt (__ka virtual addresses)
and level2_ident_pgt (__va virtual address) contain the same
PMD entries. So if you modify a PTE in __ka, it will be reflected
in __va (and vice-versa).
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

4fac153a

xen/mmu: The xen_setup_kernel_pagetable doesn't need to return anything. · 3699aad0

由 Konrad Rzeszutek Wilk 提交于 6月 28, 2012

We don't need to return the new PGD - as we do not use it.
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

3699aad0

Revert "xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain." and... · 51faaf2b

由 Konrad Rzeszutek Wilk 提交于 8月 22, 2012

Revert "xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain." and "xen/x86: Use memblock_reserve for sensitive areas."

This reverts commit 806c312e and
commit 59b29440.

And also documents setup.c and why we want to do it that way, which
is that we tried to make the the memblock_reserve more selective so
that it would be clear what region is reserved. Sadly we ran
in the problem wherein on a 64-bit hypervisor with a 32-bit
initial domain, the pt_base has the cr3 value which is not
neccessarily where the pagetable starts! As Jan put it: "
Actually, the adjustment turns out to be correct: The page
tables for a 32-on-64 dom0 get allocated in the order "first L1",
"first L2", "first L3", so the offset to the page table base is
indeed 2. When reading xen/include/public/xen.h's comment
very strictly, this is not a violation (since there nothing is said
that the first thing in the page table space is pointed to by
pt_base; I admit that this seems to be implied though, namely
do I think that it is implied that the page table space is the
range [pt_base, pt_base + nt_pt_frames), whereas that
range here indeed is [pt_base - 2, pt_base - 2 + nt_pt_frames),
which - without a priori knowledge - the kernel would have
difficulty to figure out)." - so lets just fall back to the
easy way and reserve the whole region.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

51faaf2b

xen/swiotlb: Fix compile warnings when using plain integer instead of NULL pointer. · 6d7083ee

由 Konrad Rzeszutek Wilk 提交于 8月 13, 2012

arch/x86/xen/pci-swiotlb-xen.c:96:1: warning: Using plain integer as NULL pointer
arch/x86/xen/pci-swiotlb-xen.c:96:1: warning: Using plain integer as NULL pointer
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

6d7083ee

xen: allow privcmd for HVM guests · 1a1d4331

由 Stefano Stabellini 提交于 8月 22, 2012

This patch removes the "return -ENOSYS" for auto_translated_physmap
guests from privcmd_mmap, thus it allows ARM guests to issue privcmd
mmap calls. However privcmd mmap calls are still going to fail for HVM
and hybrid guests on x86 because the xen_remap_domain_mfn_range
implementation is currently PV only.

Changes in v2:

- better commit message;
- return -EINVAL from xen_remap_domain_mfn_range if
  auto_translated_physmap.
Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

1a1d4331

xen: Introduce xen_pfn_t for pfn and mfn types · bd3f79b7

由 Stefano Stabellini 提交于 8月 22, 2012

All the original Xen headers have xen_pfn_t as mfn and pfn type, however
when they have been imported in Linux, xen_pfn_t has been replaced with
unsigned long. That might work for x86 and ia64 but it does not for arm.
Bring back xen_pfn_t and let each architecture define xen_pfn_t as they
see fit.
Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

bd3f79b7

xen/setup: Fix one-off error when adding for-balloon PFNs to the P2M. · c96aae1f

由 Konrad Rzeszutek Wilk 提交于 8月 17, 2012

When we are finished with return PFNs to the hypervisor, then
populate it back, and also mark the E820 MMIO and E820 gaps
as IDENTITY_FRAMEs, we then call P2M to set areas that can
be used for ballooning. We were off by one, and ended up
over-writting a P2M entry that most likely was an IDENTITY_FRAME.
For example:

1-1 mapping on 40000->40200
1-1 mapping on bc558->bc5ac
1-1 mapping on bc5b4->bc8c5
1-1 mapping on bc8c6->bcb7c
1-1 mapping on bcd00->100000
Released 614 pages of unused memory
Set 277889 page(s) to 1-1 mapping
Populating 40200-40466 pfn range: 614 pages added

=> here we set from 40466 up to bc559 P2M tree to be
INVALID_P2M_ENTRY. We should have done it up to bc558.

The end result is that if anybody is trying to construct
a PTE for PFN bc558 they end up with ~PAGE_PRESENT.

CC: stable@vger.kernel.org
Reported-by-and-Tested-by: NAndre Przywara <andre.przywara@amd.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

c96aae1f

x86, microcode, AMD: Fix broken ucode patch size check · 36bf50d7

由 Andreas Herrmann 提交于 7月 31, 2012

This issue was recently observed on an AMD C-50 CPU where a patch of
maximum size was applied.

Commit be62adb4 ("x86, microcode, AMD: Simplify ucode verification")
added current_size in get_matching_microcode(). This is calculated as
size of the ucode patch + 8 (ie. size of the header). Later this is
compared against the maximum possible ucode patch size for a CPU family.
And of course this fails if the patch has already maximum size.

Cc: <stable@vger.kernel.org> [3.3+]
Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/1344361461-10076-1-git-send-email-bp@amd64.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

36bf50d7

KVM: x86 emulator: use stack size attribute to mask rsp in stack ops · 5ad105e5

由 Avi Kivity 提交于 8月 19, 2012

The sub-register used to access the stack (sp, esp, or rsp) is not
determined by the address size attribute like other memory references,
but by the stack segment's B bit (if not in x86_64 mode).

Fix by using the existing stack_mask() to figure out the correct mask.

This long-existing bug was exposed by a combination of a27685c3
(emulate invalid guest state by default), which causes many more
instructions to be emulated, and a seabios change (possibly a bug) which
causes the high 16 bits of esp to become polluted across calls to real
mode software interrupts.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

5ad105e5

22 8月, 2012 8 次提交

KVM: MMU: Fix mmu_shrink() so that it can free mmu pages as intended · 35f2d16b

由 Takuya Yoshikawa 提交于 8月 20, 2012

Although the possible race described in

  commit 85b70591
  KVM: MMU: fix shrinking page from the empty mmu

was correct, the real cause of that issue was a more trivial bug of
mmu_shrink() introduced by

  commit 19526396
  KVM: MMU: do not iterate over all VMs in mmu_shrink()

Here is the bug:

	if (kvm->arch.n_used_mmu_pages > 0) {
		if (!nr_to_scan--)
			break;
		continue;
	}

We skip VMs whose n_used_mmu_pages is not zero and try to shrink others:
in other words we try to shrink empty ones by mistake.

This patch reverses the logic so that mmu_shrink() can free pages from
the first VM whose n_used_mmu_pages is not zero.  Note that we also add
comments explaining the role of nr_to_scan which is not practically
important now, hoping this will be improved in the future.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

35f2d16b

x86/alternatives: Fix p6 nops on non-modular kernels · cb09cad4

由 Avi Kivity 提交于 8月 22, 2012

Probably a leftover from the early days of self-patching, p6nops
are marked __initconst_or_module, which causes them to be
discarded in a non-modular kernel.  If something later triggers
patching, it will overwrite kernel code with garbage.
Reported-by: NTomas Racek <tracek@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>
Cc: Michael Tokarev <mjt@tls.msk.ru>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: qemu-devel@nongnu.org
Cc: Anthony Liguori <anthony@codemonkey.ws>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Alan Cox <alan@linux.intel.com>
Link: http://lkml.kernel.org/r/5034AE84.90708@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

cb09cad4

x86/fixup_irq: Use cpu_online_mask instead of cpu_all_mask · 2530cd4f

由 Liu, Chuansheng 提交于 8月 14, 2012

When one CPU is going down and this CPU is the last one in irq
affinity, current code is setting cpu_all_mask as the new
affinity for that irq.

But for some systems (such as in Medfield Android mobile) the
firmware sends the interrupt to each CPU in the irq affinity
mask, averaged, and cpu_all_mask includes all potential CPUs,
i.e. offline ones as well.

So replace cpu_all_mask with cpu_online_mask.
Signed-off-by: Nliu chuansheng <chuansheng.liu@intel.com>
Acked-by: NYanmin Zhang <yanmin_zhang@linux.intel.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/27240C0AC20F114CBF8149A2696CBE4A137286@SHSMSX101.ccr.corp.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

2530cd4f

x86/spinlocks: Fix comment in spinlock.h · 83be4ffa

由 Richard Weinberger 提交于 8月 14, 2012

This comment is no longer true.  We support up to 2^16 CPUs
because __ticket_t is an u16 if NR_CPUS is larger than 256.
Signed-off-by: NRichard Weinberger <richard@nod.at>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

83be4ffa

mm: hugetlbfs: correctly populate shared pmd · eb48c071

由 Michal Hocko 提交于 8月 21, 2012

Each page mapped in a process's address space must be correctly
accounted for in _mapcount.  Normally the rules for this are
straightforward but hugetlbfs page table sharing is different.  The page
table pages at the PMD level are reference counted while the mapcount
remains the same.

If this accounting is wrong, it causes bugs like this one reported by
Larry Woodman:

  kernel BUG at mm/filemap.c:135!
  invalid opcode: 0000 [#1] SMP
  CPU 22
  Modules linked in: bridge stp llc sunrpc binfmt_misc dcdbas microcode pcspkr acpi_pad acpi]
  Pid: 18001, comm: mpitest Tainted: G        W    3.3.0+ #4 Dell Inc. PowerEdge R620/07NDJ2
  RIP: 0010:[<ffffffff8112cfed>]  [<ffffffff8112cfed>] __delete_from_page_cache+0x15d/0x170
  Process mpitest (pid: 18001, threadinfo ffff880428972000, task ffff880428b5cc20)
  Call Trace:
    delete_from_page_cache+0x40/0x80
    truncate_hugepages+0x115/0x1f0
    hugetlbfs_evict_inode+0x18/0x30
    evict+0x9f/0x1b0
    iput_final+0xe3/0x1e0
    iput+0x3e/0x50
    d_kill+0xf8/0x110
    dput+0xe2/0x1b0
    __fput+0x162/0x240

During fork(), copy_hugetlb_page_range() detects if huge_pte_alloc()
shared page tables with the check dst_pte == src_pte.  The logic is if
the PMD page is the same, they must be shared.  This assumes that the
sharing is between the parent and child.  However, if the sharing is
with a different process entirely then this check fails as in this
diagram:

  parent
    |
    ------------>pmd
                 src_pte----------> data page
                                        ^
  other--------->pmd--------------------|
                  ^
  child-----------|
                 dst_pte

For this situation to occur, it must be possible for Parent and Other to
have faulted and failed to share page tables with each other.  This is
possible due to the following style of race.

  PROC A                                          PROC B
  copy_hugetlb_page_range                         copy_hugetlb_page_range
    src_pte == huge_pte_offset                      src_pte == huge_pte_offset
    !src_pte so no sharing                          !src_pte so no sharing

  (time passes)

  hugetlb_fault                                   hugetlb_fault
    huge_pte_alloc                                  huge_pte_alloc
      huge_pmd_share                                 huge_pmd_share
        LOCK(i_mmap_mutex)
        find nothing, no sharing
        UNLOCK(i_mmap_mutex)
                                                      LOCK(i_mmap_mutex)
                                                      find nothing, no sharing
                                                      UNLOCK(i_mmap_mutex)
      pmd_alloc                                       pmd_alloc
      LOCK(instantiation_mutex)
      fault
      UNLOCK(instantiation_mutex)
                                                  LOCK(instantiation_mutex)
                                                  fault
                                                  UNLOCK(instantiation_mutex)

These two processes are not poing to the same data page but are not
sharing page tables because the opportunity was missed.  When either
process later forks, the src_pte == dst pte is potentially insufficient.
As the check falls through, the wrong PTE information is copied in
(harmless but wrong) and the mapcount is bumped for a page mapped by a
shared page table leading to the BUG_ON.

This patch addresses the issue by moving pmd_alloc into huge_pmd_share
which guarantees that the shared pud is populated in the same critical
section as pmd.  This also means that huge_pte_offset test in
huge_pmd_share is serialized correctly now which in turn means that the
success of the sharing will be higher as the racing tasks see the pud
and pmd populated together.

Race identified and changelog written mostly by Mel Gorman.

{akpm@linux-foundation.org: attempt to make the huge_pmd_share() comment comprehensible, clean up coding style]
Reported-by: NLarry Woodman <lwoodman@redhat.com>
Tested-by: NLarry Woodman <lwoodman@redhat.com>
Reviewed-by: NMel Gorman <mgorman@suse.de>
Signed-off-by: NMichal Hocko <mhocko@suse.cz>
Reviewed-by: NRik van Riel <riel@redhat.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Ken Chen <kenchen@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

eb48c071

xen/apic/xenbus/swiotlb/pcifront/grant/tmem: Make functions or variables static. · b8b0f559

由 Konrad Rzeszutek Wilk 提交于 8月 21, 2012

There is no need for those functions/variables to be visible. Make them
static and also fix the compile warnings of this sort:

drivers/xen/<some file>.c: warning: symbol '<blah>' was not declared. Should it be static?

Some of them just require including the header file that
declares the functions.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

b8b0f559

xen: missing includes · 4d9310e3

由 Stefano Stabellini 提交于 8月 06, 2012

Changes in v2:
- remove pvclock hack;
- remove include linux/types.h from xen/interface/xen.h.
v3:
- Compile under IA64
Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

4d9310e3

xen/swiotlb: With more than 4GB on 64-bit, disable the native SWIOTLB. · fc2341df

由 Konrad Rzeszutek Wilk 提交于 7月 27, 2012

If a PV guest is booted the native SWIOTLB should not be
turned on. It does not help us (we don't have any PCI devices)
and it eats 64MB of good memory. In the case of PV guests
with PCI devices we need the Xen-SWIOTLB one.

[v1: Rewrite comment per Stefano's suggestion]
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

fc2341df