1. 08 5月, 2013 1 次提交
  2. 07 5月, 2013 1 次提交
  3. 06 5月, 2013 1 次提交
    • K
      xen/vcpu/pvhvm: Fix vcpu hotplugging hanging. · 7f1fc268
      Konrad Rzeszutek Wilk 提交于
      If a user did:
      
      	echo 0 > /sys/devices/system/cpu/cpu1/online
      	echo 1 > /sys/devices/system/cpu/cpu1/online
      
      we would (this a build with DEBUG enabled) get to:
      smpboot: ++++++++++++++++++++=_---CPU UP  1
      .. snip..
      smpboot: Stack at about ffff880074c0ff44
      smpboot: CPU1: has booted.
      
      and hang. The RCU mechanism would kick in an try to IPI the CPU1
      but the IPIs (and all other interrupts) would never arrive at the
      CPU1. At first glance at least. A bit digging in the hypervisor
      trace shows that (using xenanalyze):
      
      [vla] d4v1 vec 243 injecting
         0.043163027 --|x d4v1 intr_window vec 243 src 5(vector) intr f3
      ]  0.043163639 --|x d4v1 vmentry cycles 1468
      ]  0.043164913 --|x d4v1 vmexit exit_reason PENDING_INTERRUPT eip ffffffff81673254
         0.043164913 --|x d4v1 inj_virq vec 243  real
        [vla] d4v1 vec 243 injecting
         0.043164913 --|x d4v1 intr_window vec 243 src 5(vector) intr f3
      ]  0.043165526 --|x d4v1 vmentry cycles 1472
      ]  0.043166800 --|x d4v1 vmexit exit_reason PENDING_INTERRUPT eip ffffffff81673254
         0.043166800 --|x d4v1 inj_virq vec 243  real
        [vla] d4v1 vec 243 injecting
      
      there is a pending event (subsequent debugging shows it is the IPI
      from the VCPU0 when smpboot.c on VCPU1 has done
      "set_cpu_online(smp_processor_id(), true)") and the guest VCPU1 is
      interrupted with the callback IPI (0xf3 aka 243) which ends up calling
      __xen_evtchn_do_upcall.
      
      The __xen_evtchn_do_upcall seems to do *something* but not acknowledge
      the pending events. And the moment the guest does a 'cli' (that is the
      ffffffff81673254 in the log above) the hypervisor is invoked again to
      inject the IPI (0xf3) to tell the guest it has pending interrupts.
      This repeats itself forever.
      
      The culprit was the per_cpu(xen_vcpu, cpu) pointer. At the bootup
      we set each per_cpu(xen_vcpu, cpu) to point to the
      shared_info->vcpu_info[vcpu] but later on use the VCPUOP_register_vcpu_info
      to register per-CPU  structures (xen_vcpu_setup).
      This is used to allow events for more than 32 VCPUs and for performance
      optimizations reasons.
      
      When the user performs the VCPU hotplug we end up calling the
      the xen_vcpu_setup once more. We make the hypercall which returns
      -EINVAL as it does not allow multiple registration calls (and
      already has re-assigned where the events are being set). We pick
      the fallback case and set per_cpu(xen_vcpu, cpu) to point to the
      shared_info->vcpu_info[vcpu] (which is a good fallback during bootup).
      However the hypervisor is still setting events in the register
      per-cpu structure (per_cpu(xen_vcpu_info, cpu)).
      
      As such when the events are set by the hypervisor (such as timer one),
      and when we iterate in __xen_evtchn_do_upcall we end up reading stale
      events from the shared_info->vcpu_info[vcpu] instead of the
      per_cpu(xen_vcpu_info, cpu) structures. Hence we never acknowledge the
      events that the hypervisor has set and the hypervisor keeps on reminding
      us to ack the events which we never do.
      
      The fix is simple. Don't on the second time when xen_vcpu_setup is
      called over-write the per_cpu(xen_vcpu, cpu) if it points to
      per_cpu(xen_vcpu_info).
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      CC: stable@vger.kernel.org
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      7f1fc268
  4. 17 4月, 2013 2 次提交
    • K
      xen/time: Fix kasprintf splat when allocating timer%d IRQ line. · 7918c92a
      Konrad Rzeszutek Wilk 提交于
      When we online the CPU, we get this splat:
      
      smpboot: Booting Node 0 Processor 1 APIC 0x2
      installing Xen timer for CPU 1
      BUG: sleeping function called from invalid context at /home/konrad/ssd/konrad/linux/mm/slab.c:3179
      in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/1
      Pid: 0, comm: swapper/1 Not tainted 3.9.0-rc6upstream-00001-g3884fad #1
      Call Trace:
       [<ffffffff810c1fea>] __might_sleep+0xda/0x100
       [<ffffffff81194617>] __kmalloc_track_caller+0x1e7/0x2c0
       [<ffffffff81303758>] ? kasprintf+0x38/0x40
       [<ffffffff813036eb>] kvasprintf+0x5b/0x90
       [<ffffffff81303758>] kasprintf+0x38/0x40
       [<ffffffff81044510>] xen_setup_timer+0x30/0xb0
       [<ffffffff810445af>] xen_hvm_setup_cpu_clockevents+0x1f/0x30
       [<ffffffff81666d0a>] start_secondary+0x19c/0x1a8
      
      The solution to that is use kasprintf in the CPU hotplug path
      that 'online's the CPU. That is, do it in in xen_hvm_cpu_notify,
      and remove the call to in xen_hvm_setup_cpu_clockevents.
      
      Unfortunatly the later is not a good idea as the bootup path
      does not use xen_hvm_cpu_notify so we would end up never allocating
      timer%d interrupt lines when booting. As such add the check for
      atomic() to continue.
      
      CC: stable@vger.kernel.org
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      7918c92a
    • D
      x86/xen: populate boot_params with EDD data · 96f28bc6
      David Vrabel 提交于
      During early setup of a dom0 kernel, populate boot_params with the
      Enhanced Disk Drive (EDD) and MBR signature data.  This makes
      information on the BIOS boot device available in /sys/firmware/edd/.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: NJan Beulich <jbeulich@suse.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      96f28bc6
  5. 28 2月, 2013 1 次提交
    • K
      xen/pat: Disable PAT using pat_enabled value. · c79c4982
      Konrad Rzeszutek Wilk 提交于
      The git commit 8eaffa67
      (xen/pat: Disable PAT support for now) explains in details why
      we want to disable PAT for right now. However that
      change was not enough and we should have also disabled
      the pat_enabled value. Otherwise we end up with:
      
      mmap-example:3481 map pfn expected mapping type write-back for
      [mem 0x00010000-0x00010fff], got uncached-minus
       ------------[ cut here ]------------
      WARNING: at /build/buildd/linux-3.8.0/arch/x86/mm/pat.c:774 untrack_pfn+0xb8/0xd0()
      mem 0x00010000-0x00010fff], got uncached-minus
      ------------[ cut here ]------------
      WARNING: at /build/buildd/linux-3.8.0/arch/x86/mm/pat.c:774
      untrack_pfn+0xb8/0xd0()
      ...
      Pid: 3481, comm: mmap-example Tainted: GF 3.8.0-6-generic #13-Ubuntu
      Call Trace:
       [<ffffffff8105879f>] warn_slowpath_common+0x7f/0xc0
       [<ffffffff810587fa>] warn_slowpath_null+0x1a/0x20
       [<ffffffff8104bcc8>] untrack_pfn+0xb8/0xd0
       [<ffffffff81156c1c>] unmap_single_vma+0xac/0x100
       [<ffffffff81157459>] unmap_vmas+0x49/0x90
       [<ffffffff8115f808>] exit_mmap+0x98/0x170
       [<ffffffff810559a4>] mmput+0x64/0x100
       [<ffffffff810560f5>] dup_mm+0x445/0x660
       [<ffffffff81056d9f>] copy_process.part.22+0xa5f/0x1510
       [<ffffffff81057931>] do_fork+0x91/0x350
       [<ffffffff81057c76>] sys_clone+0x16/0x20
       [<ffffffff816ccbf9>] stub_clone+0x69/0x90
       [<ffffffff816cc89d>] ? system_call_fastpath+0x1a/0x1f
      ---[ end trace 4918cdd0a4c9fea4 ]---
      
      (a similar message shows up if you end up launching 'mcelog')
      
      The call chain is (as analyzed by Liu, Jinsong):
      do_fork
        --> copy_process
          --> dup_mm
            --> dup_mmap
             	--> copy_page_range
                --> track_pfn_copy
                  --> reserve_pfn_range
                    --> line 624: flags != want_flags
      It comes from different memory types of page table (_PAGE_CACHE_WB) and MTRR
      (_PAGE_CACHE_UC_MINUS).
      
      Stefan Bader dug in this deep and found out that:
      "That makes it clearer as this will do
      
      reserve_memtype(...)
      --> pat_x_mtrr_type
        --> mtrr_type_lookup
          --> __mtrr_type_lookup
      
      And that can return -1/0xff in case of MTRR not being enabled/initialized. Which
      is not the case (given there are no messages for it in dmesg). This is not equal
      to MTRR_TYPE_WRBACK and thus becomes _PAGE_CACHE_UC_MINUS.
      
      It looks like the problem starts early in reserve_memtype:
      
             	if (!pat_enabled) {
                      /* This is identical to page table setting without PAT */
                      if (new_type) {
                              if (req_type == _PAGE_CACHE_WC)
                                      *new_type = _PAGE_CACHE_UC_MINUS;
                              else
                                     	*new_type = req_type & _PAGE_CACHE_MASK;
                     	}
                      return 0;
              }
      
      This would be what we want, that is clearing the PWT and PCD flags from the
      supported flags - if pat_enabled is disabled."
      
      This patch does that - disabling PAT.
      
      CC: stable@vger.kernel.org # 3.3 and further
      Reported-by: NSander Eikelenboom <linux@eikelenboom.it>
      Reported-and-Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reported-and-Tested-by: NStefan Bader <stefan.bader@canonical.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      c79c4982
  6. 15 2月, 2013 2 次提交
  7. 24 1月, 2013 1 次提交
  8. 18 12月, 2012 1 次提交
  9. 01 12月, 2012 1 次提交
  10. 29 11月, 2012 1 次提交
  11. 27 11月, 2012 1 次提交
  12. 02 11月, 2012 1 次提交
    • O
      xen PVonHVM: use E820_Reserved area for shared_info · 9d02b43d
      Olaf Hering 提交于
      This is a respin of 00e37bdb
      ("xen PVonHVM: move shared_info to MMIO before kexec").
      
      Currently kexec in a PVonHVM guest fails with a triple fault because the
      new kernel overwrites the shared info page. The exact failure depends on
      the size of the kernel image. This patch moves the pfn from RAM into an
      E820 reserved memory area.
      
      The pfn containing the shared_info is located somewhere in RAM. This will
      cause trouble if the current kernel is doing a kexec boot into a new
      kernel. The new kernel (and its startup code) can not know where the pfn
      is, so it can not reserve the page. The hypervisor will continue to update
      the pfn, and as a result memory corruption occours in the new kernel.
      
      The toolstack marks the memory area FC000000-FFFFFFFF as reserved in the
      E820 map. Within that range newer toolstacks (4.3+) will keep 1MB
      starting from FE700000 as reserved for guest use. Older Xen4 toolstacks
      will usually not allocate areas up to FE700000, so FE700000 is expected
      to work also with older toolstacks.
      
      In Xen3 there is no reserved area at a fixed location. If the guest is
      started on such old hosts the shared_info page will be placed in RAM. As
      a result kexec can not be used.
      Signed-off-by: NOlaf Hering <olaf@aepfle.de>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      9d02b43d
  13. 20 10月, 2012 1 次提交
  14. 12 10月, 2012 2 次提交
    • K
      xen/bootup: allow {read|write}_cr8 pvops call. · 1a7bbda5
      Konrad Rzeszutek Wilk 提交于
      We actually do not do anything about it. Just return a default
      value of zero and if the kernel tries to write anything but 0
      we BUG_ON.
      
      This fixes the case when an user tries to suspend the machine
      and it blows up in save_processor_state b/c 'read_cr8' is set
      to NULL and we get:
      
      kernel BUG at /home/konrad/ssd/linux/arch/x86/include/asm/paravirt.h:100!
      invalid opcode: 0000 [#1] SMP
      Pid: 2687, comm: init.late Tainted: G           O 3.6.0upstream-00002-gac264ac-dirty #4 Bochs Bochs
      RIP: e030:[<ffffffff814d5f42>]  [<ffffffff814d5f42>] save_processor_state+0x212/0x270
      
      .. snip..
      Call Trace:
       [<ffffffff810733bf>] do_suspend_lowlevel+0xf/0xac
       [<ffffffff8107330c>] ? x86_acpi_suspend_lowlevel+0x10c/0x150
       [<ffffffff81342ee2>] acpi_suspend_enter+0x57/0xd5
      
      CC: stable@vger.kernel.org
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      1a7bbda5
    • K
      xen/bootup: allow read_tscp call for Xen PV guests. · cd0608e7
      Konrad Rzeszutek Wilk 提交于
      The hypervisor will trap it. However without this patch,
      we would crash as the .read_tscp is set to NULL. This patch
      fixes it and sets it to the native_read_tscp call.
      
      CC: stable@vger.kernel.org
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      cd0608e7
  15. 24 9月, 2012 1 次提交
  16. 20 9月, 2012 1 次提交
    • K
      xen/boot: Disable BIOS SMP MP table search. · bd49940a
      Konrad Rzeszutek Wilk 提交于
      As the initial domain we are able to search/map certain regions
      of memory to harvest configuration data. For all low-level we
      use ACPI tables - for interrupts we use exclusively ACPI _PRT
      (so DSDT) and MADT for INT_SRC_OVR.
      
      The SMP MP table is not used at all. As a matter of fact we do
      not even support machines that only have SMP MP but no ACPI tables.
      
      Lets follow how Moorestown does it and just disable searching
      for BIOS SMP tables.
      
      This also fixes an issue on HP Proliant BL680c G5 and DL380 G6:
      
      9f->100 for 1:1 PTE
      Freeing 9f-100 pfn range: 97 pages freed
      1-1 mapping on 9f->100
      .. snip..
      e820: BIOS-provided physical RAM map:
      Xen: [mem 0x0000000000000000-0x000000000009efff] usable
      Xen: [mem 0x000000000009f400-0x00000000000fffff] reserved
      Xen: [mem 0x0000000000100000-0x00000000cfd1dfff] usable
      .. snip..
      Scan for SMP in [mem 0x00000000-0x000003ff]
      Scan for SMP in [mem 0x0009fc00-0x0009ffff]
      Scan for SMP in [mem 0x000f0000-0x000fffff]
      found SMP MP-table at [mem 0x000f4fa0-0x000f4faf] mapped at [ffff8800000f4fa0]
      (XEN) mm.c:908:d0 Error getting mfn 100 (pfn 5555555555555555) from L1 entry 0000000000100461 for l1e_owner=0, pg_owner=0
      (XEN) mm.c:4995:d0 ptwr_emulate: could not get_page_from_l1e()
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff81ac07e2>] xen_set_pte_init+0x66/0x71
      . snip..
      Pid: 0, comm: swapper Not tainted 3.6.0-rc6upstream-00188-gb6fb969-dirty #2 HP ProLiant BL680c G5
      .. snip..
      Call Trace:
       [<ffffffff81ad31c6>] __early_ioremap+0x18a/0x248
       [<ffffffff81624731>] ? printk+0x48/0x4a
       [<ffffffff81ad32ac>] early_ioremap+0x13/0x15
       [<ffffffff81acc140>] get_mpc_size+0x2f/0x67
       [<ffffffff81acc284>] smp_scan_config+0x10c/0x136
       [<ffffffff81acc2e4>] default_find_smp_config+0x36/0x5a
       [<ffffffff81ac3085>] setup_arch+0x5b3/0xb5b
       [<ffffffff81624731>] ? printk+0x48/0x4a
       [<ffffffff81abca7f>] start_kernel+0x90/0x390
       [<ffffffff81abc356>] x86_64_start_reservations+0x131/0x136
       [<ffffffff81abfa83>] xen_start_kernel+0x65f/0x661
      (XEN) Domain 0 crashed: 'noreboot' set - not rebooting.
      
      which is that ioremap would end up mapping 0xff using _PAGE_IOMAP
      (which is what early_ioremap sticks as a flag) - which meant
      we would get MFN 0xFF (pte ff461, which is OK), and then it would
      also map 0x100 (b/c ioremap tries to get page aligned request, and
      it was trying to map 0xf4fa0 + PAGE_SIZE - so it mapped the next page)
      as _PAGE_IOMAP. Since 0x100 is actually a RAM page, and the _PAGE_IOMAP
      bypasses the P2M lookup we would happily set the PTE to 1000461.
      Xen would deny the request since we do not have access to the
      Machine Frame Number (MFN) of 0x100. The P2M[0x100] is for example
      0x80140.
      
      CC: stable@vger.kernel.org
      Fixes-Oracle-Bugzilla: https://bugzilla.oracle.com/bugzilla/show_bug.cgi?id=13665Acked-by: NJan Beulich <jbeulich@suse.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      bd49940a
  17. 23 8月, 2012 2 次提交
    • K
      xen/mmu: The xen_setup_kernel_pagetable doesn't need to return anything. · 3699aad0
      Konrad Rzeszutek Wilk 提交于
      We don't need to return the new PGD - as we do not use it.
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      3699aad0
    • K
      Revert "xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain." and... · 51faaf2b
      Konrad Rzeszutek Wilk 提交于
      Revert "xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain." and "xen/x86: Use memblock_reserve for sensitive areas."
      
      This reverts commit 806c312e and
      commit 59b29440.
      
      And also documents setup.c and why we want to do it that way, which
      is that we tried to make the the memblock_reserve more selective so
      that it would be clear what region is reserved. Sadly we ran
      in the problem wherein on a 64-bit hypervisor with a 32-bit
      initial domain, the pt_base has the cr3 value which is not
      neccessarily where the pagetable starts! As Jan put it: "
      Actually, the adjustment turns out to be correct: The page
      tables for a 32-on-64 dom0 get allocated in the order "first L1",
      "first L2", "first L3", so the offset to the page table base is
      indeed 2. When reading xen/include/public/xen.h's comment
      very strictly, this is not a violation (since there nothing is said
      that the first thing in the page table space is pointed to by
      pt_base; I admit that this seems to be implied though, namely
      do I think that it is implied that the page table space is the
      range [pt_base, pt_base + nt_pt_frames), whereas that
      range here indeed is [pt_base - 2, pt_base - 2 + nt_pt_frames),
      which - without a priori knowledge - the kernel would have
      difficulty to figure out)." - so lets just fall back to the
      easy way and reserve the whole region.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      51faaf2b
  18. 22 8月, 2012 3 次提交
    • K
      xen/apic/xenbus/swiotlb/pcifront/grant/tmem: Make functions or variables static. · b8b0f559
      Konrad Rzeszutek Wilk 提交于
      There is no need for those functions/variables to be visible. Make them
      static and also fix the compile warnings of this sort:
      
      drivers/xen/<some file>.c: warning: symbol '<blah>' was not declared. Should it be static?
      
      Some of them just require including the header file that
      declares the functions.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      b8b0f559
    • K
      xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain. · 806c312e
      Konrad Rzeszutek Wilk 提交于
      If a 64-bit hypervisor is booted with a 32-bit initial domain,
      the hypervisor deals with the initial domain as "compat" and
      does some extra adjustments (like pagetables are 4 bytes instead
      of 8). It also adjusts the xen_start_info->pt_base incorrectly.
      
      When booted with a 32-bit hypervisor (32-bit initial domain):
      ..
      (XEN)  Start info:    cf831000->cf83147c
      (XEN)  Page tables:   cf832000->cf8b5000
      ..
      [    0.000000] PT: cf832000 (f832000)
      [    0.000000] Reserving PT: f832000->f8b5000
      
      And with a 64-bit hypervisor:
      (XEN)  Start info:    00000000cf831000->00000000cf8314b4
      (XEN)  Page tables:   00000000cf832000->00000000cf8b6000
      
      [    0.000000] PT: cf834000 (f834000)
      [    0.000000] Reserving PT: f834000->f8b8000
      
      To deal with this, we keep keep track of the highest physical
      address we have reserved via memblock_reserve. If that address
      does not overlap with pt_base, we have a gap which we reserve.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      806c312e
    • K
      xen/x86: Use memblock_reserve for sensitive areas. · 59b29440
      Konrad Rzeszutek Wilk 提交于
      instead of a big memblock_reserve. This way we can be more
      selective in freeing regions (and it also makes it easier
      to understand where is what).
      
      [v1: Move the auto_translate_physmap to proper line]
      [v2: Per Stefano suggestion add more comments]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      59b29440
  19. 17 8月, 2012 1 次提交
  20. 14 9月, 2012 1 次提交
  21. 20 7月, 2012 7 次提交
  22. 08 6月, 2012 1 次提交
  23. 01 6月, 2012 1 次提交
    • A
      xen/setup: filter APERFMPERF cpuid feature out · 5e626254
      Andre Przywara 提交于
      Xen PV kernels allow access to the APERF/MPERF registers to read the
      effective frequency. Access to the MSRs is however redirected to the
      currently scheduled physical CPU, making consecutive read and
      compares unreliable. In addition each rdmsr traps into the hypervisor.
      So to avoid bogus readouts and expensive traps, disable the kernel
      internal feature flag for APERF/MPERF if running under Xen.
      This will
      a) remove the aperfmperf flag from /proc/cpuinfo
      b) not mislead the power scheduler (arch/x86/kernel/cpu/sched.c) to
         use the feature to improve scheduling (by default disabled)
      c) not mislead the cpufreq driver to use the MSRs
      
      This does not cover userland programs which access the MSRs via the
      device file interface, but this will be addressed separately.
      Signed-off-by: NAndre Przywara <andre.przywara@amd.com>
      Cc: stable@vger.kernel.org # v3.0+
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      5e626254
  24. 31 5月, 2012 1 次提交
  25. 08 5月, 2012 4 次提交