提交 · c9f6e9977e38de15da96b732a8dec0ef56cbf977 · openanolis / cloud-kernel

22 1月, 2014 1 次提交

xen/pvh: Set X86_CR0_WP and others in CR0 (v2) · c9f6e997

由 Roger Pau Monne 提交于 1月 20, 2014

otherwise we will get for some user-space applications
that use 'clone' with CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID
end up hitting an assert in glibc manifested by:

general protection ip:7f80720d364c sp:7fff98fd8a80 error:0 in
libc-2.13.so[7f807209e000+180000]

This is due to the nature of said operations which sets and clears
the PID.  "In the successful one I can see that the page table of
the parent process has been updated successfully to use a
different physical page, so the write of the tid on
that page only affects the child...

On the other hand, in the failed case, the write seems to happen before
the copy of the original page is done, so both the parent and the child
end up with the same value (because the parent copies the page after
the write of the child tid has already happened)."
(Roger's analysis). The nature of this is due to the Xen's commit
of 51e2cac257ec8b4080d89f0855c498cbbd76a5e5
"x86/pvh: set only minimal cr0 and cr4 flags in order to use paging"
the CR0_WP was removed so COW features of the Linux kernel were not
operating properly.

While doing that also update the rest of the CR0 flags to be inline
with what a baremetal Linux kernel would set them to.

In 'secondary_startup_64' (baremetal Linux) sets:

X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | X86_CR0_NE | X86_CR0_WP |
X86_CR0_AM | X86_CR0_PG

The hypervisor for HVM type guests (which PVH is a bit) sets:
X86_CR0_PE | X86_CR0_ET | X86_CR0_TS
For PVH it specifically sets:
X86_CR0_PG

Which means we need to set the rest: X86_CR0_MP | X86_CR0_NE  |
X86_CR0_WP | X86_CR0_AM to have full parity.
Signed-off-by: NRoger Pau Monne <roger.pau@citrix.com>
Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v1: Took out the cr4 writes to be a seperate patch]
[v2: 0-DAY kernel found xen_setup_gdt to be missing a static]

c9f6e997

10 1月, 2014 1 次提交

xen/pvh: Use 'depend' instead of 'select'. · 54d44eb3

由 Konrad Rzeszutek Wilk 提交于 1月 10, 2014

The usage of 'select' means it will enable the CONFIG
options without checking their dependencies. That meant
we would inadvertently turn on CONFIG_XEN_PVHM while its
core dependency (CONFIG_PCI) was turned off.

This patch fixes the warnings and compile failures:

warning: (XEN_PVH) selects XEN_PVHVM which has unmet direct
dependencies (HYPERVISOR_GUEST && XEN && PCI && X86_LOCAL_APIC)
Reported-by: NJim Davis <jim.epost@gmail.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

54d44eb3

07 1月, 2014 2 次提交

xen/pvh: remove duplicated include from enlighten.c · 5602aba8

由 Wei Yongjun 提交于 1月 07, 2014

Remove duplicated include.
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

5602aba8

xen/pvh: Fix compile issues with xen_pvh_domain() · 0869a642

由 Konrad Rzeszutek Wilk 提交于 1月 07, 2014

Oddly enough it compiles for my ancient compiler but with
the supplied .config it does blow up. Fix is easy enough.
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Reported-by: NJim Davis <jim.epost@gmail.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

0869a642

06 1月, 2014 15 次提交

xen/pvh: Support ParaVirtualized Hardware extensions (v3). · 4e903a20

由 Mukesh Rathor 提交于 12月 31, 2013

PVH allows PV linux guest to utilize hardware extended capabilities,
such as running MMU updates in a HVM container.

The Xen side defines PVH as (from docs/misc/pvh-readme.txt,
with modifications):

"* the guest uses auto translate:
 - p2m is managed by Xen
 - pagetables are owned by the guest
 - mmu_update hypercall not available
* it uses event callback and not vlapic emulation,
* IDT is native, so set_trap_table hcall is also N/A for a PVH guest.

For a full list of hcalls supported for PVH, see pvh_hypercall64_table
in arch/x86/hvm/hvm.c in xen.  From the ABI prespective, it's mostly a
PV guest with auto translate, although it does use hvm_op for setting
callback vector."

Use .ascii and .asciz to define xen feature string. Note, the PVH
string must be in a single line (not multiple lines with \) to keep the
assembler from putting null char after each string before \.
This patch allows it to be configured and enabled.

We also use introduce the 'XEN_ELFNOTE_SUPPORTED_FEATURES' ELF note to
tell the hypervisor that 'hvm_callback_vector' is what the kernel
needs. We can not put it in 'XEN_ELFNOTE_FEATURES' as older hypervisor
parse fields they don't understand as errors and refuse to load
the kernel. This work-around fixes the problem.
Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>

4e903a20

xen/pvh: Piggyback on PVHVM for grant driver (v4) · 6926f6d6

由 Konrad Rzeszutek Wilk 提交于 1月 03, 2014

In PVH the shared grant frame is the PFN and not MFN,
hence its mapped via the same code path as HVM.

The allocation of the grant frame is done differently - we
do not use the early platform-pci driver and have an
ioremap area - instead we use balloon memory and stitch
all of the non-contingous pages in a virtualized area.

That means when we call the hypervisor to replace the GMFN
with a XENMAPSPACE_grant_table type, we need to lookup the
old PFN for every iteration instead of assuming a flat
contingous PFN allocation.

Lastly, we only use v1 for grants. This is because PVHVM
is not able to use v2 due to no XENMEM_add_to_physmap
calls on the error status page (see commit
69e8f430
 xen/granttable: Disable grant v2 for HVM domains.)

Until that is implemented this workaround has to
be in place.

Also per suggestions by Stefano utilize the PVHVM paths
as they share common functionality.

v2 of this patch moves most of the PVH code out in the
arch/x86/xen/grant-table driver and touches only minimally
the generic driver.

v3, v4: fixes us some of the code due to earlier patches.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>

6926f6d6

xen/pvh: Piggyback on PVHVM for event channels (v2) · 2771374d

由 Mukesh Rathor 提交于 12月 11, 2013

PVH is a PV guest with a twist - there are certain things
that work in it like HVM and some like PV. There is
a similar mode - PVHVM where we run in HVM mode with
PV code enabled - and this patch explores that.

The most notable PV interfaces are the XenBus and event channels.

We will piggyback on how the event channel mechanism is
used in PVHVM - that is we want the normal native IRQ mechanism
and we will install a vector (hvm callback) for which we
will call the event channel mechanism.

This means that from a pvops perspective, we can use
native_irq_ops instead of the Xen PV specific. Albeit in the
future we could support pirq_eoi_map. But that is
a feature request that can be shared with PVHVM.
Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>

2771374d

xen/pvh: Update E820 to work with PVH (v2) · 9103bb0f

由 Mukesh Rathor 提交于 12月 13, 2013

In xen_add_extra_mem() we can skip updating P2M as it's managed
by Xen. PVH maps the entire IO space, but only RAM pages need
to be repopulated.
Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>

9103bb0f

xen/pvh: Secondary VCPU bringup (non-bootup CPUs) · 5840c84b

由 Mukesh Rathor 提交于 12月 13, 2013

The VCPU bringup protocol follows the PV with certain twists.
From xen/include/public/arch-x86/xen.h:

Also note that when calling DOMCTL_setvcpucontext and VCPU_initialise
for HVM and PVH guests, not all information in this structure is updated:

 - For HVM guests, the structures read include: fpu_ctxt (if
 VGCT_I387_VALID is set), flags, user_regs, debugreg[*]

 - PVH guests are the same as HVM guests, but additionally use ctrlreg[3] to
 set cr3. All other fields not used should be set to 0.

This is what we do. We piggyback on the 'xen_setup_gdt' - but modify
a bit - we need to call 'load_percpu_segment' so that 'switch_to_new_gdt'
can load per-cpu data-structures. It has no effect on the VCPU0.

We also piggyback on the %rdi register to pass in the CPU number - so
that when we bootup a new CPU, the cpu_bringup_and_idle will have
passed as the first parameter the CPU number (via %rdi for 64-bit).
Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

5840c84b

xen/pvh: Load GDT/GS in early PV bootup code for BSP. · 8d656bbe

由 Mukesh Rathor 提交于 12月 13, 2013

During early bootup we start life using the Xen provided
GDT, which means that we are running with %cs segment set
to FLAT_KERNEL_CS (FLAT_RING3_CS64 0xe033, GDT index 261).

But for PVH we want to be use HVM type mechanism for
segment operations. As such we need to switch to the HVM
one and also reload ourselves with the __KERNEL_CS:eip
to run in the proper GDT and segment.

For HVM this is usually done in 'secondary_startup_64' in
(head_64.S) but since we are not taking that bootup
path (we start in PV - xen_start_kernel) we need to do
that in the early PV bootup paths.

For good measure we also zero out the %fs, %ds, and %es
(not strictly needed as Xen has already cleared them
for us). The %gs is loaded by 'switch_to_new_gdt'.
Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>

8d656bbe

xen/pvh: Setup up shared_info. · 4dd322bc

由 Mukesh Rathor 提交于 12月 31, 2013

For PVHVM the shared_info structure is provided via the same way
as for normal PV guests (see include/xen/interface/xen.h).

That is during bootup we get 'xen_start_info' via the %esi register
in startup_xen. Then later we extract the 'shared_info' from said
structure (in xen_setup_shared_info) and start using it.

The 'xen_setup_shared_info' is all setup to work with auto-xlat
guests, but there are two functions which it calls that are not:
xen_setup_mfn_list_list and xen_setup_vcpu_info_placement.
This patch modifies the P2M code (xen_setup_mfn_list_list)
while the "Piggyback on PVHVM for event channels" modifies
the xen_setup_vcpu_info_placement.
Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

4dd322bc

xen/pvh/mmu: Use PV TLB instead of native. · 76bcceff

由 Mukesh Rathor 提交于 1月 03, 2014

We also optimize one - the TLB flush. The native operation would
needlessly IPI offline VCPUs causing extra wakeups. Using the
Xen one avoids that and lets the hypervisor determine which
VCPU needs the TLB flush.
Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

76bcceff

xen/pvh: MMU changes for PVH (v2) · 4e44e44b

由 Mukesh Rathor 提交于 12月 31, 2013

.. which are surprisingly small compared to the amount for PV code.

PVH uses mostly native mmu ops, we leave the generic (native_*) for
the majority and just overwrite the baremetal with the ones we need.

At startup, we are running with pre-allocated page-tables
courtesy of the tool-stack. But we still need to graft them
in the Linux initial pagetables. However there is no need to
unpin/pin and change them to R/O or R/W.

Note that the xen_pagetable_init due to 7836fec9d0994cc9c9150c5a33f0eb0eb08a335a
"xen/mmu/p2m: Refactor the xen_pagetable_init code." does not
need any changes - we just need to make sure that xen_post_allocator_init
does not alter the pvops from the default native one.
Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>

4e44e44b

xen/mmu: Cleanup xen_pagetable_p2m_copy a bit. · b621e157

由 Konrad Rzeszutek Wilk 提交于 1月 03, 2014

Stefano noticed that the code runs only under 64-bit so
the comments about 32-bit are pointless.

Also we change the condition for xen_revector_p2m_tree
returning the same value (because it could not allocate
a swath of space to put the new P2M in) or it had been
called once already. In such we return early from the
function.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>

b621e157

xen/mmu/p2m: Refactor the xen_pagetable_init code (v2). · 32df75cd

由 Konrad Rzeszutek Wilk 提交于 12月 31, 2013

The revectoring and copying of the P2M only happens when
!auto-xlat and on 64-bit builds. It is not obvious from
the code, so lets have seperate 32 and 64-bit functions.

We also invert the check for auto-xlat to make the code
flow simpler.
Suggested-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

32df75cd

xen/pvh: Don't setup P2M tree. · 696fd7c5

由 Konrad Rzeszutek Wilk 提交于 12月 15, 2013

P2M is not available for PVH. Fortunatly for us the
P2M code already has mostly the support for auto-xlat guest thanks to
commit 3d24bbd7
"grant-table: call set_phys_to_machine after mapping grant refs"
which: "
introduces set_phys_to_machine calls for auto_translated guests
(even on x86) in gnttab_map_refs and gnttab_unmap_refs.
translated by swiotlb-xen... " so we don't need to muck much.

with above mentioned "commit you'll get set_phys_to_machine calls
from gnttab_map_refs and gnttab_unmap_refs but PVH guests won't do
anything with them " (Stefano Stabellini) which is OK - we want
them to be NOPs.

This is because we assume that an "IOMMU is always present on the
plaform and Xen is going to make the appropriate IOMMU pagetable
changes in the hypercall implementation of GNTTABOP_map_grant_ref
and GNTTABOP_unmap_grant_ref, then eveything should be transparent
from PVH priviligied point of view and DMA transfers involving
foreign pages keep working with no issues[sp]

Otherwise we would need a P2M (and an M2P) for PVH priviligied to
track these foreign pages .. (see arch/arm/xen/p2m.c)."
(Stefano Stabellini).

We still have to inhibit the building of the P2M tree.
That had been done in the past by not calling
xen_build_dynamic_phys_to_machine (which setups the P2M tree
and gives us virtual address to access them). But we are missing
a check for xen_build_mfn_list_list - which was continuing to setup
the P2M tree and would blow up at trying to get the virtual
address of p2m_missing (which would have been setup by
xen_build_dynamic_phys_to_machine).

Hence a check is needed to not call xen_build_mfn_list_list when
running in auto-xlat mode.

Instead of replicating the check for auto-xlat in enlighten.c
do it in the p2m.c code. The reason is that the xen_build_mfn_list_list
is called also in xen_arch_post_suspend without any checks for
auto-xlat. So for PVH or PV with auto-xlat - we would needlessly
allocate space for an P2M tree.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>

696fd7c5

xen/pvh: Early bootup changes in PV code (v4). · d285d683

由 Mukesh Rathor 提交于 12月 13, 2013

We don't use the filtering that 'xen_cpuid' is doing
because the hypervisor treats 'XEN_EMULATE_PREFIX' as
an invalid instruction. This means that all of the filtering
will have to be done in the hypervisor/toolstack.

Without the filtering we expose to the guest the:

 - cpu topology (sockets, cores, etc);
 - the APERF (which the generic scheduler likes to
    use), see  5e626254
    "xen/setup: filter APERFMPERF cpuid feature out"
 - and the inability to figure out whether MWAIT_LEAF
   should be exposed or not. See
   df88b2d9
   "xen/enlighten: Disable MWAIT_LEAF so that acpi-pad won't be loaded."
 - x2apic, see  4ea9b9ac
   "xen: mask x2APIC feature in PV"

We also check for vector callback early on, as it is a required
feature. PVH also runs at default kernel IOPL.

Finally, pure PV settings are moved to a separate function that are
only called for pure PV, ie, pv with pvmmu. They are also #ifdef
with CONFIG_XEN_PVMMU.
Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>

d285d683

xen/pvh/x86: Define what an PVH guest is (v3). · ddc416cb

由 Mukesh Rathor 提交于 12月 13, 2013

Which is a PV guest with auto page translation enabled
and with vector callback. It is a cross between PVHVM and PV.

The Xen side defines PVH as (from docs/misc/pvh-readme.txt,
with modifications):

"* the guest uses auto translate:
 - p2m is managed by Xen
 - pagetables are owned by the guest
 - mmu_update hypercall not available
* it uses event callback and not vlapic emulation,
* IDT is native, so set_trap_table hcall is also N/A for a PVH guest.

For a full list of hcalls supported for PVH, see pvh_hypercall64_table
in arch/x86/hvm/hvm.c in xen.  From the ABI prespective, it's mostly a
PV guest with auto translate, although it does use hvm_op for setting
callback vector."

Also we use the PV cpuid, albeit we can use the HVM (native) cpuid.
However, we do have a fair bit of filtering in the xen_cpuid and
we can piggyback on that until the hypervisor/toolstack filters
the appropiate cpuids. Once that is done we can swap over to
use the native one.

We setup a Kconfig entry that is disabled by default and
cannot be enabled.

Note that on ARM the concept of PVH is non-existent. As Ian
put it: "an ARM guest is neither PV nor HVM nor PVHVM.
It's a bit like PVH but is different also (it's further towards
the H end of the spectrum than even PVH).". As such these
options (PVHVM, PVH) are never enabled nor seen on ARM
compilations.
Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

ddc416cb

xen/x86: set VIRQ_TIMER priority to maximum · 8785c676

由 David Vrabel 提交于 9月 23, 2013

Commit bee980d9 (xen/events: Handle VIRQ_TIMER before any other hardirq
in event loop) effectively made the VIRQ_TIMER the highest priority event
when using the 2-level ABI.

Set the VIRQ_TIMER priority to the highest so this behaviour is retained
when using the FIFO-based ABI.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

8785c676

04 1月, 2014 2 次提交

xen/pvhvm: Remove the xen_platform_pci int. · 6f6c15ef

由 Konrad Rzeszutek Wilk 提交于 12月 11, 2013

Since we have  xen_has_pv_devices,xen_has_pv_disk_devices,
xen_has_pv_nic_devices, and xen_has_pv_and_legacy_disk_devices
to figure out the different 'unplug' behaviors - lets
use those instead of this single int.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

6f6c15ef

xen/pvhvm: If xen_platform_pci=0 is set don't blow up (v4). · 51c71a3b

由 Konrad Rzeszutek Wilk 提交于 11月 26, 2013

The user has the option of disabling the platform driver:
00:02.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01)

which is used to unplug the emulated drivers (IDE, Realtek 8169, etc)
and allow the PV drivers to take over. If the user wishes
to disable that they can set:

  xen_platform_pci=0
  (in the guest config file)

or
  xen_emul_unplug=never
  (on the Linux command line)

except it does not work properly. The PV drivers still try to
load and since the Xen platform driver is not run - and it
has not initialized the grant tables, most of the PV drivers
stumble upon:

input: Xen Virtual Keyboard as /devices/virtual/input/input5
input: Xen Virtual Pointer as /devices/virtual/input/input6M
------------[ cut here ]------------
kernel BUG at /home/konrad/ssd/konrad/linux/drivers/xen/grant-table.c:1206!
invalid opcode: 0000 [#1] SMP
Modules linked in: xen_kbdfront(+) xenfs xen_privcmd
CPU: 6 PID: 1389 Comm: modprobe Not tainted 3.13.0-rc1upstream-00021-ga6c892b-dirty #1
Hardware name: Xen HVM domU, BIOS 4.4-unstable 11/26/2013
RIP: 0010:[<ffffffff813ddc40>]  [<ffffffff813ddc40>] get_free_entries+0x2e0/0x300
Call Trace:
 [<ffffffff8150d9a3>] ? evdev_connect+0x1e3/0x240
 [<ffffffff813ddd0e>] gnttab_grant_foreign_access+0x2e/0x70
 [<ffffffffa0010081>] xenkbd_connect_backend+0x41/0x290 [xen_kbdfront]
 [<ffffffffa0010a12>] xenkbd_probe+0x2f2/0x324 [xen_kbdfront]
 [<ffffffff813e5757>] xenbus_dev_probe+0x77/0x130
 [<ffffffff813e7217>] xenbus_frontend_dev_probe+0x47/0x50
 [<ffffffff8145e9a9>] driver_probe_device+0x89/0x230
 [<ffffffff8145ebeb>] __driver_attach+0x9b/0xa0
 [<ffffffff8145eb50>] ? driver_probe_device+0x230/0x230
 [<ffffffff8145eb50>] ? driver_probe_device+0x230/0x230
 [<ffffffff8145cf1c>] bus_for_each_dev+0x8c/0xb0
 [<ffffffff8145e7d9>] driver_attach+0x19/0x20
 [<ffffffff8145e260>] bus_add_driver+0x1a0/0x220
 [<ffffffff8145f1ff>] driver_register+0x5f/0xf0
 [<ffffffff813e55c5>] xenbus_register_driver_common+0x15/0x20
 [<ffffffff813e76b3>] xenbus_register_frontend+0x23/0x40
 [<ffffffffa0015000>] ? 0xffffffffa0014fff
 [<ffffffffa001502b>] xenkbd_init+0x2b/0x1000 [xen_kbdfront]
 [<ffffffff81002049>] do_one_initcall+0x49/0x170

.. snip..

which is hardly nice. This patch fixes this by having each
PV driver check for:
 - if running in PV, then it is fine to execute (as that is their
   native environment).
 - if running in HVM, check if user wanted 'xen_emul_unplug=never',
   in which case bail out and don't load any PV drivers.
 - if running in HVM, and if PCI device 5853:0001 (xen_platform_pci)
   does not exist, then bail out and not load PV drivers.
 - (v2) if running in HVM, and if the user wanted 'xen_emul_unplug=ide-disks',
   then bail out for all PV devices _except_ the block one.
   Ditto for the network one ('nics').
 - (v2) if running in HVM, and if the user wanted 'xen_emul_unplug=unnecessary'
   then load block PV driver, and also setup the legacy IDE paths.
   In (v3) make it actually load PV drivers.

Reported-by: Sander Eikelenboom <linux@eikelenboom.it
Reported-by: NAnthony PERARD <anthony.perard@citrix.com>
Reported-and-Tested-by: NFabio Fantoni <fabio.fantoni@m2r.biz>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v2: Add extra logic to handle the myrid ways 'xen_emul_unplug'
can be used per Ian and Stefano suggestion]
[v3: Make the unnecessary case work properly]
[v4: s/disks/ide-disks/ spotted by Fabio]
Reviewed-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> [for PCI parts]
CC: stable@vger.kernel.org

51c71a3b

15 11月, 2013 2 次提交

mm: dynamically allocate page->ptl if it cannot be embedded to struct page · 49076ec2

由 Kirill A. Shutemov 提交于 11月 14, 2013

If split page table lock is in use, we embed the lock into struct page
of table's page.  We have to disable split lock, if spinlock_t is too
big be to be embedded, like when DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC
enabled.

This patch add support for dynamic allocation of split page table lock
if we can't embed it to struct page.

page->ptl is unsigned long now and we use it as spinlock_t if
sizeof(spinlock_t) <= sizeof(long), otherwise it's pointer to spinlock_t.

The spinlock_t allocated in pgtable_page_ctor() for PTE table and in
pgtable_pmd_page_ctor() for PMD table.  All other helpers converted to
support dynamically allocated page->ptl.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

49076ec2

mm: rename USE_SPLIT_PTLOCKS to USE_SPLIT_PTE_PTLOCKS · 57c1ffce

由 Kirill A. Shutemov 提交于 11月 14, 2013

We're going to introduce split page table lock for PMD level.  Let's
rename existing split ptlock for PTE level to avoid confusion.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: NAlex Thorlton <athorlton@sgi.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

57c1ffce

09 11月, 2013 2 次提交

pci-swiotlb-xen: call pci_request_acs only ifdef CONFIG_PCI · 92c0fd17

由 Stefano Stabellini 提交于 11月 04, 2013

Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

92c0fd17

xen: delete new instances of added __cpuinit · 3b284bde

由 Paul Gortmaker 提交于 11月 08, 2013

commit 6efa20e4
("xen: Support 64-bit PV guest receiving NMIs") and
commit cd9151e2
( "xen/balloon: set a mapping for ballooned out pages")
added new instances of __cpuinit usage.

We removed this a couple versions ago; we now want to remove
the compat no-op stubs.  Introducing new users is not what
we want to see at this point in time, as it will break once
the stubs are gone.

Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

3b284bde

07 11月, 2013 1 次提交

x86/xen: remove deprecated IRQF_DISABLED · 9d71cee6

由 Michael Opdenacker 提交于 9月 07, 2013

This patch proposes to remove the IRQF_DISABLED flag from x86/xen
code. It's a NOOP since 2.6.35 and it will be removed one day.
Signed-off-by: NMichael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

9d71cee6

10 10月, 2013 4 次提交

xen: Fix possible user space selector corruption · 7cde9b27

由 Frediano Ziglio 提交于 10月 10, 2013

Due to the way kernel is initialized under Xen is possible that the
ring1 selector used by the kernel for the boot cpu end up to be copied
to userspace leading to segmentation fault in the userspace.

Xen code in the kernel initialize no-boot cpus with correct selectors (ds
and es set to __USER_DS) but the boot one keep the ring1 (passed by Xen).
On task context switch (switch_to) we assume that ds, es and cs already
point to __USER_DS and __KERNEL_CSso these selector are not changed.

If processor is an Intel that support sysenter instruction sysenter/sysexit
is used so ds and es are not restored switching back from kernel to
userspace. In the case the selectors point to a ring1 instead of __USER_DS
the userspace code will crash on first memory access attempt (to be
precise Xen on the emulated iret used to do sysexit will detect and set ds
and es to zero which lead to GPF anyway).

Now if an userspace process call kernel using sysenter and get rescheduled
(for me it happen on a specific init calling wait4) could happen that the
ring1 selector is set to ds and es.

This is quite hard to detect cause after a while these selectors are fixed
(__USER_DS seems sticky).

Bisecting the code commit 7076aada appears
to be the first one that have this issue.
Signed-off-by: NFrediano Ziglio <frediano.ziglio@citrix.com>
Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: NAndrew Cooper <andrew.cooper3@citrix.com>

7cde9b27

swiotlb-xen: use xen_alloc/free_coherent_pages · 1b65c4e5

由 Stefano Stabellini 提交于 10月 10, 2013

Use xen_alloc_coherent_pages and xen_free_coherent_pages to allocate or
free coherent pages.

We need to be careful handling the pointer returned by
xen_alloc_coherent_pages, because on ARM the pointer is not equal to
phys_to_virt(*dma_handle). In fact virt_to_phys only works for kernel
direct mapped RAM memory.
In ARM case the pointer could be an ioremap address, therefore passing
it to virt_to_phys would give you another physical address that doesn't
correspond to it.

Make xen_create_contiguous_region take a phys_addr_t as start parameter to
avoid the virt_to_phys calls which would be incorrect.

Changes in v6:
- remove extra spaces.
Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

1b65c4e5

xen: make xen_create_contiguous_region return the dma address · 69908907

由 Stefano Stabellini 提交于 10月 09, 2013

Modify xen_create_contiguous_region to return the dma address of the
newly contiguous buffer.
Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>


Changes in v4:
- use virt_to_machine instead of virt_to_bus.

69908907

xen/x86: allow __set_phys_to_machine for autotranslate guests · 2f558d40

由 Stefano Stabellini 提交于 10月 09, 2013

Allow __set_phys_to_machine to be called for autotranslate guests.
It can be used to keep track of phys_to_machine changes, however we
don't do anything with the information at the moment.
Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>

2f558d40

27 9月, 2013 1 次提交

xen/mmu: Correct PAT MST setting. · b1922a51

由 Konrad Rzeszutek Wilk 提交于 9月 25, 2013

Jan Beulich spotted that the PAT MSR settings in the Xen public
document that "the first (PAT6) column was wrong across the
board, and the column for PAT7 was missing altogether."

This updates it to be in sync.

CC: Jan Beulich <jbeulich@suse.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: NJan Beulich <jbeulich@suse.com>

b1922a51

25 9月, 2013 2 次提交

xen/p2m: check MFN is in range before using the m2p table · 0160676b

由 David Vrabel 提交于 9月 13, 2013

On hosts with more than 168 GB of memory, a 32-bit guest may attempt
to grant map an MFN that is error cannot lookup in its mapping of the
m2p table.  There is an m2p lookup as part of m2p_add_override() and
m2p_remove_override().  The lookup falls off the end of the mapped
portion of the m2p and (because the mapping is at the highest virtual
address) wraps around and the lookup causes a fault on what appears to
be a user space address.

do_page_fault() (thinking it's a fault to a userspace address), tries
to lock mm->mmap_sem.  If the gntdev device is used for the grant map,
m2p_add_override() is called from from gnttab_mmap() with mm->mmap_sem
already locked.  do_page_fault() then deadlocks.

The deadlock would most commonly occur when a 64-bit guest is started
and xenconsoled attempts to grant map its console ring.

Introduce mfn_to_pfn_no_overrides() which checks the MFN is within the
mapped portion of the m2p table before accessing the table and use
this in m2p_add_override(), m2p_remove_override(), and mfn_to_pfn()
(which already had the correct range check).

All faults caused by accessing the non-existant parts of the m2p are
thus within the kernel address space and exception_fixup() is called
without trying to lock mm->mmap_sem.

This means that for MFNs that are outside the mapped range of the m2p
then mfn_to_pfn() will always look in the m2p overrides.  This is
correct because it must be a foreign MFN (and the PFN in the m2p in
this case is only relevant for the other domain).
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@citrix.com>
Cc: Jan Beulich <JBeulich@suse.com>
--
v3: check for auto_translated_physmap in mfn_to_pfn_no_overrides()
v2: in mfn_to_pfn() look in m2p_overrides if the MFN is out of
    range as it's probably foreign.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>

0160676b

xen: Do not enable spinlocks before jump_label_init() has executed · a945928e

由 Konrad Rzeszutek Wilk 提交于 9月 12, 2013

xen_init_spinlocks() currently calls static_key_slow_inc() before
jump_label_init() is invoked. When CONFIG_JUMP_LABEL is set (which usually is
the case) the effect of this static_key_slow_inc() is deferred until after
jump_label_init(). This is different from when CONFIG_JUMP_LABEL is not set, in
which case the key is set immediately. Thus, depending on the value of config
option, we may observe different behavior.

In addition, when we come to __jump_label_transform() from jump_label_init(),
the key (paravirt_ticketlocks_enabled) is already enabled. On processors where
ideal_nop is not the same as default_nop this will cause a BUG() since it is
expected that before a key is enabled the latter is replaced by the former
during initialization.

To address this problem we need to move
static_key_slow_inc(&paravirt_ticketlocks_enabled) so that it is called
after jump_label_init(). We also need to make sure that this is done before
other cpus start to boot. early_initcall appears to be  a good place to do so.
(Note that we cannot move whole xen_init_spinlocks() there since pv_lock_ops
need to be set before alternative_instructions() runs.)
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v2: Added extra comments in the code]
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>

a945928e

10 9月, 2013 6 次提交

xen/spinlock: Don't use __initdate for xen_pv_spin · c3b7cb1f

由 Konrad Rzeszutek Wilk 提交于 9月 09, 2013

As we get compile warnings about .init.data being
used by non-init functions.
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

c3b7cb1f

Revert "xen/spinlock: Disable IRQ spinlock (PV) allocation on PVHVM" · fb78e58c

由 Konrad Rzeszutek Wilk 提交于 8月 26, 2013

This reverts commit 70dd4998.

Now that the bugs have been resolved we can re-enable the
PV ticketlock implementation under PVHVM Xen guests.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>

fb78e58c

xen/spinlock: Don't setup xen spinlock IPI kicker if disabled. · 3310bbed

由 Konrad Rzeszutek Wilk 提交于 8月 26, 2013

There is no need to setup this kicker IPI if we are never going
to use the paravirtualized ticketlock mechanism.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>

3310bbed

xen/smp: Update pv_lock_ops functions before alternative code starts under PVHVM · 26a79995

由 Konrad Rzeszutek Wilk 提交于 8月 16, 2013

Before this patch we would patch all of the pv_lock_ops sites
using alternative assembler. Then later in the bootup cycle
change the unlock_kick and lock_spinning to the Xen specific -
without re patching.

That meant that for the core of the kernel we would be running
with the baremetal version of unlock_kick and lock_spinning while
for modules we would have the proper Xen specific slowpaths.

As most of the module uses some API from the core kernel that ended
up with slowpath lockers waiting forever to be kicked (b/c they
would be using the Xen specific slowpath logic). And the
kick never came b/c the unlock path that was taken was the
baremetal one.

On PV we do not have the problem as we initialise before the
alternative code kicks in.

The fix is to make the updating of the pv_lock_ops function
be done before the alternative code starts patching.

Note that this patch fixes issues discovered by commit
f10cd522.
("xen: disable PV spinlocks on HVM") wherein it mentioned

PV spinlocks cannot possibly work with the current code because they are
enabled after pvops patching has already been done, and because PV
spinlocks use a different data structure than native spinlocks so we
cannot switch between them dynamically.

The first problem is solved by this patch.

The second problem has been solved by commit
816434ec
(Merge branch 'x86-spinlocks-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip)

P.S.
There is still the commit 70dd4998
(xen/spinlock: Disable IRQ spinlock (PV) allocation on PVHVM) to
revert but that can be done later after all other bugs have been
fixed.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>

26a79995

xen/spinlock: We don't need the old structure anymore · 6055aaf8

由 Konrad Rzeszutek Wilk 提交于 8月 13, 2013

As we are using the generic ticketlock structs and these
old structures are not needed anymore.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>

6055aaf8

xen/spinlock: Fix locking path engaging too soon under PVHVM. · 1fb3a8b2

由 Konrad Rzeszutek Wilk 提交于 8月 13, 2013

The xen_lock_spinning has a check for the kicker interrupts
and if it is not initialized it will spin normally (not enter
the slowpath).

But for PVHVM case we would initialize the kicker interrupt
before the CPU came online. This meant that if the booting
CPU used a spinlock and went in the slowpath - it would
enter the slowpath and block forever. The forever part because
during bootup: the spinlock would be taken _before_ the CPU
sets itself to be online (more on this further), and we enter
to poll on the event channel forever.

The bootup CPU (see commit fc78d343
"xen/smp: initialize IPI vectors before marking CPU online"
for details) and the CPU that started the bootup consult
the cpu_online_mask to determine whether the booting CPU should
get an IPI. The booting CPU has to set itself in this mask via:

  set_cpu_online(smp_processor_id(), true);

However, if the spinlock is taken before this (and it is) and
it polls on an event channel - it will never be woken up as
the kernel will never send an IPI to an offline CPU.

Note that the PVHVM logic in sending IPIs is using the HVM
path which has numerous checks using the cpu_online_mask
and cpu_active_mask. See above mention git commit for details.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>

1fb3a8b2

09 9月, 2013 1 次提交

xen/p2m: Don't call get_balloon_scratch_page() twice, keep interrupts disabled for multicalls · d7f8f48d

由 Boris Ostrovsky 提交于 9月 09, 2013

m2p_remove_override() calls get_balloon_scratch_page() in
MULTI_update_va_mapping() even though it already has pointer to this page from
the earlier call (in scratch_page). This second call doesn't have a matching
put_balloon_scratch_page() thus not restoring preempt count back. (Also, there
is no put_balloon_scratch_page() in the error path.)

In addition, the second multicall uses __xen_mc_entry() which does not disable
interrupts. Rearrange xen_mc_* calls to keep interrupts off while performing
multicalls.

This commit fixes a regression introduced by:

commit ee072640
Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Date:   Tue Jul 23 17:23:54 2013 +0000

    xen/m2p: use GNTTABOP_unmap_and_replace to reinstate the original mapping
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>

d7f8f48d

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功