1. 06 10月, 2015 1 次提交
    • D
      x86/xen/p2m: hint at the last populated P2M entry · 98dd166e
      David Vrabel 提交于
      With commit 633d6f17 (x86/xen: prepare
      p2m list for memory hotplug) the P2M may be sized to accomdate a much
      larger amount of memory than the domain currently has.
      
      When saving a domain, the toolstack must scan all the P2M looking for
      populated pages.  This results in a performance regression due to the
      unnecessary scanning.
      
      Instead of reporting (via shared_info) the maximum possible size of
      the P2M, hint at the last PFN which might be populated.  This hint is
      increased as new leaves are added to the P2M (in the expectation that
      they will be used for populated entries).
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Cc: <stable@vger.kernel.org> # 4.0+
      98dd166e
  2. 29 9月, 2015 1 次提交
  3. 28 9月, 2015 3 次提交
    • V
      x86/xen: Support kexec/kdump in HVM guests by doing a soft reset · 0b34a166
      Vitaly Kuznetsov 提交于
      Currently there is a number of issues preventing PVHVM Xen guests from
      doing successful kexec/kdump:
      
        - Bound event channels.
        - Registered vcpu_info.
        - PIRQ/emuirq mappings.
        - shared_info frame after XENMAPSPACE_shared_info operation.
        - Active grant mappings.
      
      Basically, newly booted kernel stumbles upon already set up Xen
      interfaces and there is no way to reestablish them. In Xen-4.7 a new
      feature called 'soft reset' is coming. A guest performing kexec/kdump
      operation is supposed to call SCHEDOP_shutdown hypercall with
      SHUTDOWN_soft_reset reason before jumping to new kernel. Hypervisor
      (with some help from toolstack) will do full domain cleanup (but
      keeping its memory and vCPU contexts intact) returning the guest to
      the state it had when it was first booted and thus allowing it to
      start over.
      
      Doing SHUTDOWN_soft_reset on Xen hypervisors which don't support it is
      probably OK as by default all unknown shutdown reasons cause domain
      destroy with a message in toolstack log: 'Unknown shutdown reason code
      5. Destroying domain.'  which gives a clue to what the problem is and
      eliminates false expectations.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      0b34a166
    • B
      xen/x86: Don't try to write syscall-related MSRs for PV guests · 2ecf91b6
      Boris Ostrovsky 提交于
      For PV guests these registers are set up by hypervisor and thus
      should not be written by the guest. The comment in xen_write_msr_safe()
      says so but we still write the MSRs, causing the hypervisor to
      print a warning.
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      2ecf91b6
    • J
      xen: use correct type for HYPERVISOR_memory_op() · 24f775a6
      Juergen Gross 提交于
      HYPERVISOR_memory_op() is defined to return an "int" value. This is
      wrong, as the Xen hypervisor will return "long".
      
      The sub-function XENMEM_maximum_reservation returns the maximum
      number of pages for the current domain. An int will overflow for a
      domain configured with 8TB of memory or more.
      
      Correct this by using the correct type.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      24f775a6
  4. 09 9月, 2015 2 次提交
  5. 08 9月, 2015 4 次提交
    • J
      xen: switch extra memory accounting to use pfns · 626d7508
      Juergen Gross 提交于
      Instead of using physical addresses for accounting of extra memory
      areas available for ballooning switch to pfns as this is much less
      error prone regarding partial pages.
      Reported-by: NRoger Pau Monné <roger.pau@citrix.com>
      Tested-by: NRoger Pau Monné <roger.pau@citrix.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      626d7508
    • J
      xen: limit memory to architectural maximum · cb9e444b
      Juergen Gross 提交于
      When a pv-domain (including dom0) is started it tries to size it's
      p2m list according to the maximum possible memory amount it ever can
      achieve. Limit the initial maximum memory size to the architectural
      limit of the hardware in order to avoid overflows during remapping
      of memory.
      
      This problem will occur when dom0 is started with an initial memory
      size being a multiple of 1GB, but without specifying it's maximum
      memory size. The kernel must be configured without
      CONFIG_XEN_BALLOON_MEMORY_HOTPLUG for the problem to happen.
      Reported-by: NRoger Pau Monné <roger.pau@citrix.com>
      Tested-by: NRoger Pau Monné <roger.pau@citrix.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      cb9e444b
    • J
      xen: avoid another early crash of memory limited dom0 · ab24507c
      Juergen Gross 提交于
      Commit b1c9f169047b ("xen: split counting of extra memory pages...")
      introduced an error when dom0 was started with limited memory occurring
      only on some hardware.
      
      The problem arises in case dom0 is started with initial memory and
      maximum memory being the same. The kernel must be configured without
      CONFIG_XEN_BALLOON_MEMORY_HOTPLUG for the problem to happen. If all
      of this is true and the E820 map of the machine is sparse (some areas
      are not covered) then the machine might crash early in the boot
      process.
      
      An example E820 map triggering the problem looks like this:
      
      [    0.000000] e820: BIOS-provided physical RAM map:
      [    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009d7ff] usable
      [    0.000000] BIOS-e820: [mem 0x000000000009d800-0x000000000009ffff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
      [    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000cf7fafff] usable
      [    0.000000] BIOS-e820: [mem 0x00000000cf7fb000-0x00000000cf95ffff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000cf960000-0x00000000cfb62fff] ACPI NVS
      [    0.000000] BIOS-e820: [mem 0x00000000cfb63000-0x00000000cfd14fff] usable
      [    0.000000] BIOS-e820: [mem 0x00000000cfd15000-0x00000000cfd61fff] ACPI NVS
      [    0.000000] BIOS-e820: [mem 0x00000000cfd62000-0x00000000cfd6cfff] ACPI data
      [    0.000000] BIOS-e820: [mem 0x00000000cfd6d000-0x00000000cfd6ffff] ACPI NVS
      [    0.000000] BIOS-e820: [mem 0x00000000cfd70000-0x00000000cfd70fff] usable
      [    0.000000] BIOS-e820: [mem 0x00000000cfd71000-0x00000000cfea8fff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000cfea9000-0x00000000cfeb9fff] ACPI NVS
      [    0.000000] BIOS-e820: [mem 0x00000000cfeba000-0x00000000cfecafff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000cfecb000-0x00000000cfecbfff] ACPI NVS
      [    0.000000] BIOS-e820: [mem 0x00000000cfecc000-0x00000000cfedbfff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000cfedc000-0x00000000cfedcfff] ACPI NVS
      [    0.000000] BIOS-e820: [mem 0x00000000cfedd000-0x00000000cfeddfff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000cfede000-0x00000000cfee3fff] ACPI NVS
      [    0.000000] BIOS-e820: [mem 0x00000000cfee4000-0x00000000cfef6fff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000cfef7000-0x00000000cfefffff] usable
      [    0.000000] BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000fec10000-0x00000000fec10fff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000fed00000-0x00000000fed00fff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000fed40000-0x00000000fed44fff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000fed61000-0x00000000fed70fff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000fed80000-0x00000000fed8ffff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
      [    0.000000] BIOS-e820: [mem 0x0000000100001000-0x000000020effffff] usable
      
      In this case the area a0000-dffff isn't present in the map. This will
      confuse the memory setup of the domain when remapping the memory from
      such holes to populated areas.
      
      To avoid the problem the accounting of to be remapped memory has to
      count such holes in the E820 map as well.
      Reported-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      ab24507c
    • J
      xen: avoid early crash of memory limited dom0 · eafd72e0
      Juergen Gross 提交于
      Commit b1c9f169047b ("xen: split counting of extra memory pages...")
      introduced an error when dom0 was started with limited memory.
      
      The problem arises in case dom0 is started with initial memory and
      maximum memory being the same and exactly a multiple of 1 GB. The
      kernel must be configured without CONFIG_XEN_BALLOON_MEMORY_HOTPLUG
      for the problem to happen. In this case it will crash very early
      during boot due to the virtual mapped p2m list not being large
      enough to be able to remap any memory:
      
      (XEN) Freed 304kB init memory.
      mapping kernel into physical memory
      about to get started...
      (XEN) traps.c:459:d0v0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000]
      (XEN) domain_crash_sync called from entry.S: fault at ffff82d080229a93 create_bounce_frame+0x12b/0x13a
      (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
      (XEN) ----[ Xen-4.5.2-pre  x86_64  debug=n Not tainted ]----
      (XEN) CPU:    0
      (XEN) RIP:    e033:[<ffffffff81d120cb>]
      (XEN) RFLAGS: 0000000000000206   EM: 1 CONTEXT: pv guest (d0v0)
      (XEN) rax: ffffffff81db2000   rbx: 000000004d000000   rcx: 0000000000000000
      (XEN) rdx: 000000004d000000   rsi: 0000000000063000   rdi: 000000004d063000
      (XEN) rbp: ffffffff81c03d78   rsp: ffffffff81c03d28   r8:  0000000000023000
      (XEN) r9:  00000001040ff000   r10: 0000000000007ff0   r11: 0000000000000000
      (XEN) r12: 0000000000063000   r13: 000000000004d000   r14: 0000000000000063
      (XEN) r15: 0000000000000063   cr0: 0000000080050033   cr4: 00000000000006f0
      (XEN) cr3: 0000000105c0f000   cr2: ffffc90000268000
      (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
      (XEN) Guest stack trace from rsp=ffffffff81c03d28:
      (XEN)   0000000000000000 0000000000000000 ffffffff81d120cb 000000010000e030
      (XEN)   0000000000010006 ffffffff81c03d68 000000000000e02b ffffffffffffffff
      (XEN)   0000000000000063 000000000004d063 ffffffff81c03de8 ffffffff81d130a7
      (XEN)   ffffffff81c03de8 000000000004d000 00000001040ff000 0000000000105db1
      (XEN)   00000001040ff001 000000000004d062 ffff8800092d6ff8 0000000002027000
      (XEN)   ffff8800094d8340 ffff8800092d6ff8 00003ffffffff000 ffff8800092d7ff8
      (XEN)   ffffffff81c03e48 ffffffff81d13c43 ffff8800094d8000 ffff8800094d9000
      (XEN)   0000000000000000 ffff8800092d6000 00000000092d6000 000000004cfbf000
      (XEN)   00000000092d6000 00000000052d5442 0000000000000000 0000000000000000
      (XEN)   ffffffff81c03ed8 ffffffff81d185c1 0000000000000000 0000000000000000
      (XEN)   ffffffff81c03e78 ffffffff810f8ca4 ffffffff81c03ed8 ffffffff8171a15d
      (XEN)   0000000000000010 ffffffff81c03ee8 0000000000000000 0000000000000000
      (XEN)   ffffffff81f0e402 ffffffffffffffff ffffffff81dae900 0000000000000000
      (XEN)   0000000000000000 0000000000000000 ffffffff81c03f28 ffffffff81d0cf0f
      (XEN)   0000000000000000 0000000000000000 0000000000000000 ffffffff81db82e0
      (XEN)   0000000000000000 0000000000000000 0000000000000000 0000000000000000
      (XEN)   ffffffff81c03f38 ffffffff81d0c603 ffffffff81c03ff8 ffffffff81d11c86
      (XEN)   0300000100000032 0000000000000005 0000000000000020 0000000000000000
      (XEN)   0000000000000000 0000000000000000 0000000000000000 0000000000000000
      (XEN)   0000000000000000 0000000000000000 0000000000000000 0000000000000000
      (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
      
      This can be avoided by allocating aneough space for the p2m to cover
      the maximum memory of dom0 plus the identity mapped holes required
      for PCI space, BIOS etc.
      Reported-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      eafd72e0
  6. 20 8月, 2015 22 次提交
  7. 10 8月, 2015 1 次提交
    • J
      x86/xen: build "Xen PV" APIC driver for domU as well · fc5fee86
      Jason A. Donenfeld 提交于
      It turns out that a PV domU also requires the "Xen PV" APIC
      driver. Otherwise, the flat driver is used and we get stuck in busy
      loops that never exit, such as in this stack trace:
      
      (gdb) target remote localhost:9999
      Remote debugging using localhost:9999
      __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
      56              while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY)
      (gdb) bt
       #0  __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
       #1  __default_send_IPI_shortcut (shortcut=<optimized out>,
      dest=<optimized out>, vector=<optimized out>) at
      ./arch/x86/include/asm/ipi.h:75
       #2  apic_send_IPI_self (vector=246) at arch/x86/kernel/apic/probe_64.c:54
       #3  0xffffffff81011336 in arch_irq_work_raise () at
      arch/x86/kernel/irq_work.c:47
       #4  0xffffffff8114990c in irq_work_queue (work=0xffff88000fc0e400) at
      kernel/irq_work.c:100
       #5  0xffffffff8110c29d in wake_up_klogd () at kernel/printk/printk.c:2633
       #6  0xffffffff8110ca60 in vprintk_emit (facility=0, level=<optimized
      out>, dict=0x0 <irq_stack_union>, dictlen=<optimized out>,
      fmt=<optimized out>, args=<optimized out>)
          at kernel/printk/printk.c:1778
       #7  0xffffffff816010c8 in printk (fmt=<optimized out>) at
      kernel/printk/printk.c:1868
       #8  0xffffffffc00013ea in ?? ()
       #9  0x0000000000000000 in ?? ()
      
      Mailing-list-thread: https://lkml.org/lkml/2015/8/4/755Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      fc5fee86
  8. 31 7月, 2015 2 次提交
  9. 06 7月, 2015 1 次提交
    • A
      x86/asm/tsc, x86/paravirt: Remove read_tsc() and read_tscp() paravirt hooks · 9261e050
      Andy Lutomirski 提交于
      We've had ->read_tsc() and ->read_tscp() paravirt hooks since
      the very beginning of paravirt, i.e.,
      
        d3561b7f ("[PATCH] paravirt: header and stubs for paravirtualisation").
      
      AFAICT, the only paravirt guest implementation that ever
      replaced these calls was vmware, and it's gone. Arguably even
      vmware shouldn't have hooked RDTSC -- we fully support systems
      that don't have a TSC at all, so there's no point for a paravirt
      implementation to pretend that we have a TSC but to replace it.
      
      I also doubt that these hooks actually worked. Calls to rdtscl()
      and rdtscll(), which respected the hooks, were used seemingly
      interchangeably with native_read_tsc(), which did not.
      
      Just remove them. If anyone ever needs them again, they can try
      to make a case for why they need them.
      
      Before, on a paravirt config:
        text    	data     bss     dec     hex filename
        12618257      1816384 1093632 15528273 ecf151 vmlinux
      
      After:
        text		data     bss     dec     hex filename
        12617207      1816384 1093632 15527223 eced37 vmlinux
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang Rui <ray.huang@amd.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kvm ML <kvm@vger.kernel.org>
      Cc: virtualization@lists.linux-foundation.org
      Link: http://lkml.kernel.org/r/d08a2600fb298af163681e5efd8e599d889a5b97.1434501121.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9261e050
  10. 08 6月, 2015 3 次提交
    • I
      x86/asm/entry: Untangle 'system_call' into two entry points: entry_SYSCALL_64 and entry_INT80_32 · b2502b41
      Ingo Molnar 提交于
      The 'system_call' entry points differ starkly between native 32-bit and 64-bit
      kernels: on 32-bit kernels it defines the INT 0x80 entry point, while on
      64-bit it's the SYSCALL entry point.
      
      This is pretty confusing when looking at generic code, and it also obscures
      the nature of the entry point at the assembly level.
      
      So unangle this by splitting the name into its two uses:
      
      	system_call (32) -> entry_INT80_32
      	system_call (64) -> entry_SYSCALL_64
      
      As per the generic naming scheme for x86 system call entry points:
      
      	entry_MNEMONIC_qualifier
      
      where 'qualifier' is one of _32, _64 or _compat.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b2502b41
    • I
      x86/asm/entry: Untangle 'ia32_sysenter_target' into two entry points:... · 4c8cd0c5
      Ingo Molnar 提交于
      x86/asm/entry: Untangle 'ia32_sysenter_target' into two entry points: entry_SYSENTER_32 and entry_SYSENTER_compat
      
      So the SYSENTER instruction is pretty quirky and it has different behavior
      depending on bitness and CPU maker.
      
      Yet we create a false sense of coherency by naming it 'ia32_sysenter_target'
      in both of the cases.
      
      Split the name into its two uses:
      
      	ia32_sysenter_target (32)    -> entry_SYSENTER_32
      	ia32_sysenter_target (64)    -> entry_SYSENTER_compat
      
      As per the generic naming scheme for x86 system call entry points:
      
      	entry_MNEMONIC_qualifier
      
      where 'qualifier' is one of _32, _64 or _compat.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      4c8cd0c5
    • I
      x86/asm/entry: Rename compat syscall entry points · 2cd23553
      Ingo Molnar 提交于
      Rename the following system call entry points:
      
      	ia32_cstar_target       -> entry_SYSCALL_compat
      	ia32_syscall            -> entry_INT80_compat
      
      The generic naming scheme for x86 system call entry points is:
      
      	entry_MNEMONIC_qualifier
      
      where 'qualifier' is one of _32, _64 or _compat.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      2cd23553