1. 07 6月, 2013 1 次提交
  2. 29 5月, 2013 1 次提交
    • S
      xen: Clean up apic ipi interface · 1db01b49
      Stefan Bader 提交于
      Commit f447d56d introduced the
      implementation of the PV apic ipi interface. But there were some
      odd things (it seems none of which cause really any issue but
      maybe they should be cleaned up anyway):
       - xen_send_IPI_mask_allbutself (and by that xen_send_IPI_allbutself)
         ignore the passed in vector and only use the CALL_FUNCTION_SINGLE
         vector. While xen_send_IPI_all and xen_send_IPI_mask use the vector.
       - physflat_send_IPI_allbutself is declared unnecessarily. It is never
         used.
      
      This patch tries to clean up those things.
      Signed-off-by: NStefan Bader <stefan.bader@canonical.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      1db01b49
  3. 08 5月, 2013 3 次提交
  4. 07 5月, 2013 1 次提交
  5. 06 5月, 2013 1 次提交
    • K
      xen/vcpu/pvhvm: Fix vcpu hotplugging hanging. · 7f1fc268
      Konrad Rzeszutek Wilk 提交于
      If a user did:
      
      	echo 0 > /sys/devices/system/cpu/cpu1/online
      	echo 1 > /sys/devices/system/cpu/cpu1/online
      
      we would (this a build with DEBUG enabled) get to:
      smpboot: ++++++++++++++++++++=_---CPU UP  1
      .. snip..
      smpboot: Stack at about ffff880074c0ff44
      smpboot: CPU1: has booted.
      
      and hang. The RCU mechanism would kick in an try to IPI the CPU1
      but the IPIs (and all other interrupts) would never arrive at the
      CPU1. At first glance at least. A bit digging in the hypervisor
      trace shows that (using xenanalyze):
      
      [vla] d4v1 vec 243 injecting
         0.043163027 --|x d4v1 intr_window vec 243 src 5(vector) intr f3
      ]  0.043163639 --|x d4v1 vmentry cycles 1468
      ]  0.043164913 --|x d4v1 vmexit exit_reason PENDING_INTERRUPT eip ffffffff81673254
         0.043164913 --|x d4v1 inj_virq vec 243  real
        [vla] d4v1 vec 243 injecting
         0.043164913 --|x d4v1 intr_window vec 243 src 5(vector) intr f3
      ]  0.043165526 --|x d4v1 vmentry cycles 1472
      ]  0.043166800 --|x d4v1 vmexit exit_reason PENDING_INTERRUPT eip ffffffff81673254
         0.043166800 --|x d4v1 inj_virq vec 243  real
        [vla] d4v1 vec 243 injecting
      
      there is a pending event (subsequent debugging shows it is the IPI
      from the VCPU0 when smpboot.c on VCPU1 has done
      "set_cpu_online(smp_processor_id(), true)") and the guest VCPU1 is
      interrupted with the callback IPI (0xf3 aka 243) which ends up calling
      __xen_evtchn_do_upcall.
      
      The __xen_evtchn_do_upcall seems to do *something* but not acknowledge
      the pending events. And the moment the guest does a 'cli' (that is the
      ffffffff81673254 in the log above) the hypervisor is invoked again to
      inject the IPI (0xf3) to tell the guest it has pending interrupts.
      This repeats itself forever.
      
      The culprit was the per_cpu(xen_vcpu, cpu) pointer. At the bootup
      we set each per_cpu(xen_vcpu, cpu) to point to the
      shared_info->vcpu_info[vcpu] but later on use the VCPUOP_register_vcpu_info
      to register per-CPU  structures (xen_vcpu_setup).
      This is used to allow events for more than 32 VCPUs and for performance
      optimizations reasons.
      
      When the user performs the VCPU hotplug we end up calling the
      the xen_vcpu_setup once more. We make the hypercall which returns
      -EINVAL as it does not allow multiple registration calls (and
      already has re-assigned where the events are being set). We pick
      the fallback case and set per_cpu(xen_vcpu, cpu) to point to the
      shared_info->vcpu_info[vcpu] (which is a good fallback during bootup).
      However the hypervisor is still setting events in the register
      per-cpu structure (per_cpu(xen_vcpu_info, cpu)).
      
      As such when the events are set by the hypervisor (such as timer one),
      and when we iterate in __xen_evtchn_do_upcall we end up reading stale
      events from the shared_info->vcpu_info[vcpu] instead of the
      per_cpu(xen_vcpu_info, cpu) structures. Hence we never acknowledge the
      events that the hypervisor has set and the hypervisor keeps on reminding
      us to ack the events which we never do.
      
      The fix is simple. Don't on the second time when xen_vcpu_setup is
      called over-write the per_cpu(xen_vcpu, cpu) if it points to
      per_cpu(xen_vcpu_info).
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      CC: stable@vger.kernel.org
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      7f1fc268
  6. 17 4月, 2013 9 次提交
    • K
      xen/smp: Unifiy some of the PVs and PVHVM offline CPU path · b12abaa1
      Konrad Rzeszutek Wilk 提交于
      The "xen_cpu_die" and "xen_hvm_cpu_die" are very similar.
      Lets coalesce them.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      b12abaa1
    • K
      xen/smp/pvhvm: Don't initialize IRQ_WORKER as we are using the native one. · 27d8b207
      Konrad Rzeszutek Wilk 提交于
      There is no need to use the PV version of the IRQ_WORKER mechanism
      as under PVHVM we are using the native version. The native
      version is using the SMP API.
      
      They just sit around unused:
      
        69:          0          0  xen-percpu-ipi       irqwork0
        83:          0          0  xen-percpu-ipi       irqwork1
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      27d8b207
    • K
      xen/spinlock: Disable IRQ spinlock (PV) allocation on PVHVM · 70dd4998
      Konrad Rzeszutek Wilk 提交于
      See git commit f10cd522
      (xen: disable PV spinlocks on HVM) for details.
      
      But we did not disable it everywhere - which means that when
      we boot as PVHVM we end up allocating per-CPU irq line for
      spinlock. This fixes that.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      70dd4998
    • K
      xen/spinlock: Check against default value of -1 for IRQ line. · cb9c6f15
      Konrad Rzeszutek Wilk 提交于
      The default (uninitialized) value of the IRQ line is -1.
      Check if we already have allocated an spinlock interrupt line
      and if somebody is trying to do it again. Also set it to -1
      when we offline the CPU.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      cb9c6f15
    • K
      xen/time: Add default value of -1 for IRQ and check for that. · ef35a4e6
      Konrad Rzeszutek Wilk 提交于
      If the timer interrupt has been de-init or is just now being
      initialized, the default value of -1 should be preset as
      interrupt line. Check for that and if something is odd
      WARN us.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ef35a4e6
    • K
      xen/time: Fix kasprintf splat when allocating timer%d IRQ line. · 7918c92a
      Konrad Rzeszutek Wilk 提交于
      When we online the CPU, we get this splat:
      
      smpboot: Booting Node 0 Processor 1 APIC 0x2
      installing Xen timer for CPU 1
      BUG: sleeping function called from invalid context at /home/konrad/ssd/konrad/linux/mm/slab.c:3179
      in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/1
      Pid: 0, comm: swapper/1 Not tainted 3.9.0-rc6upstream-00001-g3884fad #1
      Call Trace:
       [<ffffffff810c1fea>] __might_sleep+0xda/0x100
       [<ffffffff81194617>] __kmalloc_track_caller+0x1e7/0x2c0
       [<ffffffff81303758>] ? kasprintf+0x38/0x40
       [<ffffffff813036eb>] kvasprintf+0x5b/0x90
       [<ffffffff81303758>] kasprintf+0x38/0x40
       [<ffffffff81044510>] xen_setup_timer+0x30/0xb0
       [<ffffffff810445af>] xen_hvm_setup_cpu_clockevents+0x1f/0x30
       [<ffffffff81666d0a>] start_secondary+0x19c/0x1a8
      
      The solution to that is use kasprintf in the CPU hotplug path
      that 'online's the CPU. That is, do it in in xen_hvm_cpu_notify,
      and remove the call to in xen_hvm_setup_cpu_clockevents.
      
      Unfortunatly the later is not a good idea as the bootup path
      does not use xen_hvm_cpu_notify so we would end up never allocating
      timer%d interrupt lines when booting. As such add the check for
      atomic() to continue.
      
      CC: stable@vger.kernel.org
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      7918c92a
    • K
      xen/smp/spinlock: Fix leakage of the spinlock interrupt line for every CPU online/offline · 66ff0fe9
      Konrad Rzeszutek Wilk 提交于
      While we don't use the spinlock interrupt line (see for details
      commit f10cd522 -
      xen: disable PV spinlocks on HVM) - we should still do the proper
      init / deinit sequence. We did not do that correctly and for the
      CPU init for PVHVM guest we would allocate an interrupt line - but
      failed to deallocate the old interrupt line.
      
      This resulted in leakage of an irq_desc but more importantly this splat
      as we online an offlined CPU:
      
      genirq: Flags mismatch irq 71. 0002cc20 (spinlock1) vs. 0002cc20 (spinlock1)
      Pid: 2542, comm: init.late Not tainted 3.9.0-rc6upstream #1
      Call Trace:
       [<ffffffff811156de>] __setup_irq+0x23e/0x4a0
       [<ffffffff81194191>] ? kmem_cache_alloc_trace+0x221/0x250
       [<ffffffff811161bb>] request_threaded_irq+0xfb/0x160
       [<ffffffff8104c6f0>] ? xen_spin_trylock+0x20/0x20
       [<ffffffff813a8423>] bind_ipi_to_irqhandler+0xa3/0x160
       [<ffffffff81303758>] ? kasprintf+0x38/0x40
       [<ffffffff8104c6f0>] ? xen_spin_trylock+0x20/0x20
       [<ffffffff810cad35>] ? update_max_interval+0x15/0x40
       [<ffffffff816605db>] xen_init_lock_cpu+0x3c/0x78
       [<ffffffff81660029>] xen_hvm_cpu_notify+0x29/0x33
       [<ffffffff81676bdd>] notifier_call_chain+0x4d/0x70
       [<ffffffff810bb2a9>] __raw_notifier_call_chain+0x9/0x10
       [<ffffffff8109402b>] __cpu_notify+0x1b/0x30
       [<ffffffff8166834a>] _cpu_up+0xa0/0x14b
       [<ffffffff816684ce>] cpu_up+0xd9/0xec
       [<ffffffff8165f754>] store_online+0x94/0xd0
       [<ffffffff8141d15b>] dev_attr_store+0x1b/0x20
       [<ffffffff81218f44>] sysfs_write_file+0xf4/0x170
       [<ffffffff811a2864>] vfs_write+0xb4/0x130
       [<ffffffff811a302a>] sys_write+0x5a/0xa0
       [<ffffffff8167ada9>] system_call_fastpath+0x16/0x1b
      cpu 1 spinlock event irq -16
      smpboot: Booting Node 0 Processor 1 APIC 0x2
      
      And if one looks at the /proc/interrupts right after
      offlining (CPU1):
      
        70:          0          0  xen-percpu-ipi       spinlock0
        71:          0          0  xen-percpu-ipi       spinlock1
        77:          0          0  xen-percpu-ipi       spinlock2
      
      There is the oddity of the 'spinlock1' still being present.
      
      CC: stable@vger.kernel.org
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      66ff0fe9
    • K
      xen/smp: Fix leakage of timer interrupt line for every CPU online/offline. · 888b65b4
      Konrad Rzeszutek Wilk 提交于
      In the PVHVM path when we do CPU online/offline path we would
      leak the timer%d IRQ line everytime we do a offline event. The
      online path (xen_hvm_setup_cpu_clockevents via
      x86_cpuinit.setup_percpu_clockev) would allocate a new interrupt
      line for the timer%d.
      
      But we would still use the old interrupt line leading to:
      
      kernel BUG at /home/konrad/ssd/konrad/linux/kernel/hrtimer.c:1261!
      invalid opcode: 0000 [#1] SMP
      RIP: 0010:[<ffffffff810b9e21>]  [<ffffffff810b9e21>] hrtimer_interrupt+0x261/0x270
      .. snip..
       <IRQ>
       [<ffffffff810445ef>] xen_timer_interrupt+0x2f/0x1b0
       [<ffffffff81104825>] ? stop_machine_cpu_stop+0xb5/0xf0
       [<ffffffff8111434c>] handle_irq_event_percpu+0x7c/0x240
       [<ffffffff811175b9>] handle_percpu_irq+0x49/0x70
       [<ffffffff813a74a3>] __xen_evtchn_do_upcall+0x1c3/0x2f0
       [<ffffffff813a760a>] xen_evtchn_do_upcall+0x2a/0x40
       [<ffffffff8167c26d>] xen_hvm_callback_vector+0x6d/0x80
       <EOI>
       [<ffffffff81666d01>] ? start_secondary+0x193/0x1a8
       [<ffffffff81666cfd>] ? start_secondary+0x18f/0x1a8
      
      There is also the oddity (timer1) in the /proc/interrupts after
      offlining CPU1:
      
        64:       1121          0  xen-percpu-virq      timer0
        78:          0          0  xen-percpu-virq      timer1
        84:          0       2483  xen-percpu-virq      timer2
      
      This patch fixes it.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      CC: stable@vger.kernel.org
      888b65b4
    • D
      x86/xen: populate boot_params with EDD data · 96f28bc6
      David Vrabel 提交于
      During early setup of a dom0 kernel, populate boot_params with the
      Enhanced Disk Drive (EDD) and MBR signature data.  This makes
      information on the BIOS boot device available in /sys/firmware/edd/.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: NJan Beulich <jbeulich@suse.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      96f28bc6
  7. 12 4月, 2013 2 次提交
  8. 11 4月, 2013 1 次提交
  9. 08 4月, 2013 1 次提交
  10. 03 4月, 2013 1 次提交
    • K
      xen/mmu: On early bootup, flush the TLB when changing RO->RW bits Xen provided pagetables. · b2222794
      Konrad Rzeszutek Wilk 提交于
      Occassionaly on a DL380 G4 the guest would crash quite early with this:
      
      (XEN) d244:v0: unhandled page fault (ec=0003)
      (XEN) Pagetable walk from ffffffff84dc7000:
      (XEN)  L4[0x1ff] = 00000000c3f18067 0000000000001789
      (XEN)  L3[0x1fe] = 00000000c3f14067 000000000000178d
      (XEN)  L2[0x026] = 00000000dc8b2067 0000000000004def
      (XEN)  L1[0x1c7] = 00100000dc8da067 0000000000004dc7
      (XEN) domain_crash_sync called from entry.S
      (XEN) Domain 244 (vcpu#0) crashed on cpu#3:
      (XEN) ----[ Xen-4.1.3OVM  x86_64  debug=n  Not tainted ]----
      (XEN) CPU:    3
      (XEN) RIP:    e033:[<ffffffff81263f22>]
      (XEN) RFLAGS: 0000000000000216   EM: 1   CONTEXT: pv guest
      (XEN) rax: 0000000000000000   rbx: ffffffff81785f88   rcx: 000000000000003f
      (XEN) rdx: 0000000000000000   rsi: 00000000dc8da063   rdi: ffffffff84dc7000
      
      The offending code shows it to be a loop writting the value zero
      (%rax) in the %rdi (the L4 provided by Xen) register:
      
         0: 44 00 00             add    %r8b,(%rax)
         3: 31 c0                 xor    %eax,%eax
         5: b9 40 00 00 00       mov    $0x40,%ecx
         a: 66 0f 1f 84 00 00 00 nopw   0x0(%rax,%rax,1)
        11: 00 00
        13: ff c9                 dec    %ecx
        15:* 48 89 07             mov    %rax,(%rdi)     <-- trapping instruction
        18: 48 89 47 08           mov    %rax,0x8(%rdi)
        1c: 48 89 47 10           mov    %rax,0x10(%rdi)
      
      which fails. xen_setup_kernel_pagetable recycles some of the Xen's
      page-table entries when it has switched over to its Linux page-tables.
      
      Right before try to clear the page, we  make a hypercall to change
      it from _RO to  _RW and that works (otherwise we would hit an BUG()).
      And the _RW flag is set for that page:
      (XEN)  L1[0x1c7] = 001000004885f067 0000000000004dc7
      
      The error code is 3, so PFEC_page_present and PFEC_write_access, so page is
      present (correct), and we tried to write to the page, but a violation
      occurred. The one theory is that the the page entries in hardware
      (which are cached) are not up to date with what we just set. Especially
      as we have just done an CR3 write and flushed the multicalls.
      
      This patch does solve the problem by flusing out the TLB page
      entry after changing it from _RO to _RW and we don't hit this
      issue anymore.
      
      Fixed-Oracle-Bug: 16243091 [ON OCCASIONS VM START GOES INTO
      'CRASH' STATE: CLEAR_PAGE+0X12 ON HP DL380 G4]
      Reported-and-Tested-by: NSaar Maoz <Saar.Maoz@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      b2222794
  11. 28 3月, 2013 1 次提交
  12. 05 3月, 2013 1 次提交
  13. 28 2月, 2013 1 次提交
    • K
      xen/pat: Disable PAT using pat_enabled value. · c79c4982
      Konrad Rzeszutek Wilk 提交于
      The git commit 8eaffa67
      (xen/pat: Disable PAT support for now) explains in details why
      we want to disable PAT for right now. However that
      change was not enough and we should have also disabled
      the pat_enabled value. Otherwise we end up with:
      
      mmap-example:3481 map pfn expected mapping type write-back for
      [mem 0x00010000-0x00010fff], got uncached-minus
       ------------[ cut here ]------------
      WARNING: at /build/buildd/linux-3.8.0/arch/x86/mm/pat.c:774 untrack_pfn+0xb8/0xd0()
      mem 0x00010000-0x00010fff], got uncached-minus
      ------------[ cut here ]------------
      WARNING: at /build/buildd/linux-3.8.0/arch/x86/mm/pat.c:774
      untrack_pfn+0xb8/0xd0()
      ...
      Pid: 3481, comm: mmap-example Tainted: GF 3.8.0-6-generic #13-Ubuntu
      Call Trace:
       [<ffffffff8105879f>] warn_slowpath_common+0x7f/0xc0
       [<ffffffff810587fa>] warn_slowpath_null+0x1a/0x20
       [<ffffffff8104bcc8>] untrack_pfn+0xb8/0xd0
       [<ffffffff81156c1c>] unmap_single_vma+0xac/0x100
       [<ffffffff81157459>] unmap_vmas+0x49/0x90
       [<ffffffff8115f808>] exit_mmap+0x98/0x170
       [<ffffffff810559a4>] mmput+0x64/0x100
       [<ffffffff810560f5>] dup_mm+0x445/0x660
       [<ffffffff81056d9f>] copy_process.part.22+0xa5f/0x1510
       [<ffffffff81057931>] do_fork+0x91/0x350
       [<ffffffff81057c76>] sys_clone+0x16/0x20
       [<ffffffff816ccbf9>] stub_clone+0x69/0x90
       [<ffffffff816cc89d>] ? system_call_fastpath+0x1a/0x1f
      ---[ end trace 4918cdd0a4c9fea4 ]---
      
      (a similar message shows up if you end up launching 'mcelog')
      
      The call chain is (as analyzed by Liu, Jinsong):
      do_fork
        --> copy_process
          --> dup_mm
            --> dup_mmap
             	--> copy_page_range
                --> track_pfn_copy
                  --> reserve_pfn_range
                    --> line 624: flags != want_flags
      It comes from different memory types of page table (_PAGE_CACHE_WB) and MTRR
      (_PAGE_CACHE_UC_MINUS).
      
      Stefan Bader dug in this deep and found out that:
      "That makes it clearer as this will do
      
      reserve_memtype(...)
      --> pat_x_mtrr_type
        --> mtrr_type_lookup
          --> __mtrr_type_lookup
      
      And that can return -1/0xff in case of MTRR not being enabled/initialized. Which
      is not the case (given there are no messages for it in dmesg). This is not equal
      to MTRR_TYPE_WRBACK and thus becomes _PAGE_CACHE_UC_MINUS.
      
      It looks like the problem starts early in reserve_memtype:
      
             	if (!pat_enabled) {
                      /* This is identical to page table setting without PAT */
                      if (new_type) {
                              if (req_type == _PAGE_CACHE_WC)
                                      *new_type = _PAGE_CACHE_UC_MINUS;
                              else
                                     	*new_type = req_type & _PAGE_CACHE_MASK;
                     	}
                      return 0;
              }
      
      This would be what we want, that is clearing the PWT and PCD flags from the
      supported flags - if pat_enabled is disabled."
      
      This patch does that - disabling PAT.
      
      CC: stable@vger.kernel.org # 3.3 and further
      Reported-by: NSander Eikelenboom <linux@eikelenboom.it>
      Reported-and-Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reported-and-Tested-by: NStefan Bader <stefan.bader@canonical.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      c79c4982
  14. 23 2月, 2013 1 次提交
    • K
      x86-64, xen, mmu: Provide an early version of write_cr3. · 0cc9129d
      Konrad Rzeszutek Wilk 提交于
      With commit 8170e6be ("x86, 64bit: Use a #PF handler to materialize
      early mappings on demand") we started hitting an early bootup crash
      where the Xen hypervisor would inform us that:
      
          (XEN) d7:v0: unhandled page fault (ec=0000)
          (XEN) Pagetable walk from ffffea000005b2d0:
          (XEN)  L4[0x1d4] = 0000000000000000 ffffffffffffffff
          (XEN) domain_crash_sync called from entry.S
          (XEN) Domain 7 (vcpu#0) crashed on cpu#3:
          (XEN) ----[ Xen-4.2.0  x86_64  debug=n  Not tainted ]----
      
      .. that Xen was unable to context switch back to dom0.
      
      Looking at the calling stack we find:
      
          [<ffffffff8103feba>] xen_get_user_pgd+0x5a  <--
          [<ffffffff8103feba>] xen_get_user_pgd+0x5a
          [<ffffffff81042d27>] xen_write_cr3+0x77
          [<ffffffff81ad2d21>] init_mem_mapping+0x1f9
          [<ffffffff81ac293f>] setup_arch+0x742
          [<ffffffff81666d71>] printk+0x48
      
      We are trying to figure out whether we need to up-date the user PGD as
      well.  Please keep in mind that under 64-bit PV guests we have a limited
      amount of rings: 0 for the Hypervisor, and 1 for both the Linux kernel
      and user-space.  As such the Linux pvops'fied version of write_cr3
      checks if it has to update the user-space cr3 as well.
      
      That clearly is not needed during early bootup.  The recent changes (see
      above git commit) streamline the x86 page table allocation to be much
      simpler (And also incidentally the #PF handler ends up in spirit being
      similar to how the Xen toolstack sets up the initial page-tables).
      
      The fix is to have an early-bootup version of cr3 that just loads the
      kernel %cr3.  The later version - which also handles user-page
      modifications will be used after the initial page tables have been
      setup.
      
      [ hpa: removed a redundant #ifdef and made the new function __init.
        Also note that x86-32 already has such an early xen_write_cr3. ]
      Tested-by: N"H. Peter Anvin" <hpa@zytor.com>
      Reported-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Link: http://lkml.kernel.org/r/1361579812-23709-1-git-send-email-konrad.wilk@oracle.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0cc9129d
  15. 20 2月, 2013 2 次提交
    • S
      xen: Send spinlock IPI to all waiters · 76eaca03
      Stefan Bader 提交于
      There is a loophole between Xen's current implementation of
      pv-spinlocks and the scheduler. This was triggerable through
      a testcase until v3.6 changed the TLB flushing code. The
      problem potentially is still there just not observable in the
      same way.
      
      What could happen was (is):
      
      1. CPU n tries to schedule task x away and goes into a slow
         wait for the runq lock of CPU n-# (must be one with a lower
         number).
      2. CPU n-#, while processing softirqs, tries to balance domains
         and goes into a slow wait for its own runq lock (for updating
         some records). Since this is a spin_lock_irqsave in softirq
         context, interrupts will be re-enabled for the duration of
         the poll_irq hypercall used by Xen.
      3. Before the runq lock of CPU n-# is unlocked, CPU n-1 receives
         an interrupt (e.g. endio) and when processing the interrupt,
         tries to wake up task x. But that is in schedule and still
         on_cpu, so try_to_wake_up goes into a tight loop.
      4. The runq lock of CPU n-# gets unlocked, but the message only
         gets sent to the first waiter, which is CPU n-# and that is
         busily stuck.
      5. CPU n-# never returns from the nested interruption to take and
         release the lock because the scheduler uses a busy wait.
         And CPU n never finishes the task migration because the unlock
         notification only went to CPU n-#.
      
      To avoid this and since the unlocking code has no real sense of
      which waiter is best suited to grab the lock, just send the IPI
      to all of them. This causes the waiters to return from the hyper-
      call (those not interrupted at least) and do active spinlocking.
      
      BugLink: http://bugs.launchpad.net/bugs/1011792Acked-by: NJan Beulich <JBeulich@suse.com>
      Signed-off-by: NStefan Bader <stefan.bader@canonical.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      76eaca03
    • K
      xen/smp: Move the common CPU init code a bit to prep for PVH patch. · dacd45f4
      Konrad Rzeszutek Wilk 提交于
      The PV and PVH code CPU init code share some functionality. The
      PVH code ("xen/pvh: Extend vcpu_guest_context, p2m, event, and XenBus")
      sets some of these up, but not all. To make it easier to read, this
      patch removes the PV specific out of the generic way.
      
      No functional change - just code movement.
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      [v2: Fixed compile errors noticed by Fengguang Wu build system]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      dacd45f4
  16. 15 2月, 2013 2 次提交
  17. 14 2月, 2013 1 次提交
    • J
      x86/xen: don't assume %ds is usable in xen_iret for 32-bit PVOPS. · 13d2b4d1
      Jan Beulich 提交于
      This fixes CVE-2013-0228 / XSA-42
      
      Drew Jones while working on CVE-2013-0190 found that that unprivileged guest user
      in 32bit PV guest can use to crash the > guest with the panic like this:
      
      -------------
      general protection fault: 0000 [#1] SMP
      last sysfs file: /sys/devices/vbd-51712/block/xvda/dev
      Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4
      iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
      xt_state nf_conntrack ip6table_filter ip6_tables ipv6 xen_netfront ext4
      mbcache jbd2 xen_blkfront dm_mirror dm_region_hash dm_log dm_mod [last
      unloaded: scsi_wait_scan]
      
      Pid: 1250, comm: r Not tainted 2.6.32-356.el6.i686 #1
      EIP: 0061:[<c0407462>] EFLAGS: 00010086 CPU: 0
      EIP is at xen_iret+0x12/0x2b
      EAX: eb8d0000 EBX: 00000001 ECX: 08049860 EDX: 00000010
      ESI: 00000000 EDI: 003d0f00 EBP: b77f8388 ESP: eb8d1fe0
       DS: 0000 ES: 007b FS: 0000 GS: 00e0 SS: 0069
      Process r (pid: 1250, ti=eb8d0000 task=c2953550 task.ti=eb8d0000)
      Stack:
       00000000 0027f416 00000073 00000206 b77f8364 0000007b 00000000 00000000
      Call Trace:
      Code: c3 8b 44 24 18 81 4c 24 38 00 02 00 00 8d 64 24 30 e9 03 00 00 00
      8d 76 00 f7 44 24 08 00 00 02 80 75 33 50 b8 00 e0 ff ff 21 e0 <8b> 40
      10 8b 04 85 a0 f6 ab c0 8b 80 0c b0 b3 c0 f6 44 24 0d 02
      EIP: [<c0407462>] xen_iret+0x12/0x2b SS:ESP 0069:eb8d1fe0
      general protection fault: 0000 [#2]
      ---[ end trace ab0d29a492dcd330 ]---
      Kernel panic - not syncing: Fatal exception
      Pid: 1250, comm: r Tainted: G      D    ---------------
      2.6.32-356.el6.i686 #1
      Call Trace:
       [<c08476df>] ? panic+0x6e/0x122
       [<c084b63c>] ? oops_end+0xbc/0xd0
       [<c084b260>] ? do_general_protection+0x0/0x210
       [<c084a9b7>] ? error_code+0x73/
      -------------
      
      Petr says: "
       I've analysed the bug and I think that xen_iret() cannot cope with
       mangled DS, in this case zeroed out (null selector/descriptor) by either
       xen_failsafe_callback() or RESTORE_REGS because the corresponding LDT
       entry was invalidated by the reproducer. "
      
      Jan took a look at the preliminary patch and came up a fix that solves
      this problem:
      
      "This code gets called after all registers other than those handled by
      IRET got already restored, hence a null selector in %ds or a non-null
      one that got loaded from a code or read-only data descriptor would
      cause a kernel mode fault (with the potential of crashing the kernel
      as a whole, if panic_on_oops is set)."
      
      The way to fix this is to realize that the we can only relay on the
      registers that IRET restores. The two that are guaranteed are the
      %cs and %ss as they are always fixed GDT selectors. Also they are
      inaccessible from user mode - so they cannot be altered. This is
      the approach taken in this patch.
      
      Another alternative option suggested by Jan would be to relay on
      the subtle realization that using the %ebp or %esp relative references uses
      the %ss segment.  In which case we could switch from using %eax to %ebp and
      would not need the %ss over-rides. That would also require one extra
      instruction to compensate for the one place where the register is used
      as scaled index. However Andrew pointed out that is too subtle and if
      further work was to be done in this code-path it could escape folks attention
      and lead to accidents.
      Reviewed-by: NPetr Matousek <pmatouse@redhat.com>
      Reported-by: NPetr Matousek <pmatouse@redhat.com>
      Reviewed-by: NAndrew Cooper <andrew.cooper3@citrix.com>
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      13d2b4d1
  18. 10 2月, 2013 2 次提交
    • L
      x86 idle: remove 32-bit-only "no-hlt" parameter, hlt_works_ok flag · 27be4570
      Len Brown 提交于
      Remove 32-bit x86 a cmdline param "no-hlt",
      and the cpuinfo_x86.hlt_works_ok that it sets.
      
      If a user wants to avoid HLT, then "idle=poll"
      is much more useful, as it avoids invocation of HLT
      in idle, while "no-hlt" failed to do so.
      
      Indeed, hlt_works_ok was consulted in only 3 places.
      
      First, in /proc/cpuinfo where "hlt_bug yes"
      would be printed if and only if the user booted
      the system with "no-hlt" -- as there was no other code
      to set that flag.
      
      Second, check_hlt() would not invoke halt() if "no-hlt"
      were on the cmdline.
      
      Third, it was consulted in stop_this_cpu(), which is invoked
      by native_machine_halt()/reboot_interrupt()/smp_stop_nmi_callback() --
      all cases where the machine is being shutdown/reset.
      The flag was not consulted in the more frequently invoked
      play_dead()/hlt_play_dead() used in processor offline and suspend.
      
      Since Linux-3.0 there has been a run-time notice upon "no-hlt" invocations
      indicating that it would be removed in 2012.
      Signed-off-by: NLen Brown <len.brown@intel.com>
      Cc: x86@kernel.org
      27be4570
    • L
      xen idle: make xen-specific macro xen-specific · 6a377ddc
      Len Brown 提交于
      This macro is only invoked by Xen,
      so make its definition specific to Xen.
      
      > set_pm_idle_to_default()
      < xen_set_default_idle()
      Signed-off-by: NLen Brown <len.brown@intel.com>
      Cc: xen-devel@lists.xensource.com
      6a377ddc
  19. 24 1月, 2013 1 次提交
  20. 16 1月, 2013 1 次提交
  21. 18 12月, 2012 2 次提交
  22. 01 12月, 2012 1 次提交
  23. 30 11月, 2012 1 次提交
  24. 29 11月, 2012 2 次提交