1. 20 8月, 2013 1 次提交
  2. 09 8月, 2013 2 次提交
    • D
      xen/p2m: avoid unneccesary TLB flush in m2p_remove_override() · 65a45fa2
      David Vrabel 提交于
      In m2p_remove_override() when removing the grant map from the kernel
      mapping and replacing with a mapping to the original page, the grant
      unmap will already have flushed the TLB and it is not necessary to do
      it again after updating the mapping.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      65a45fa2
    • K
      xen: Support 64-bit PV guest receiving NMIs · 6efa20e4
      Konrad Rzeszutek Wilk 提交于
      This is based on a patch that Zhenzhong Duan had sent - which
      was missing some of the remaining pieces. The kernel has the
      logic to handle Xen-type-exceptions using the paravirt interface
      in the assembler code (see PARAVIRT_ADJUST_EXCEPTION_FRAME -
      pv_irq_ops.adjust_exception_frame and and INTERRUPT_RETURN -
      pv_cpu_ops.iret).
      
      That means the nmi handler (and other exception handlers) use
      the hypervisor iret.
      
      The other changes that would be neccessary for this would
      be to translate the NMI_VECTOR to one of the entries on the
      ipi_vector and make xen_send_IPI_mask_allbutself use different
      events.
      
      Fortunately for us commit 1db01b49
      (xen: Clean up apic ipi interface) implemented this and we piggyback
      on the cleanup such that the apic IPI interface will pass the right
      vector value for NMI.
      
      With this patch we can trigger NMIs within a PV guest (only tested
      x86_64).
      
      For this to work with normal PV guests (not initial domain)
      we need the domain to be able to use the APIC ops - they are
      already implemented to use the Xen event channels. For that
      to be turned on in a PV domU we need to remove the masking
      of X86_FEATURE_APIC.
      
      Incidentally that means kgdb will also now work within
      a PV guest without using the 'nokgdbroundup' workaround.
      
      Note that the 32-bit version is different and this patch
      does not enable that.
      
      CC: Lisa Nguyen <lisa@xenapiadmin.com>
      CC: Ben Guthro <benjamin.guthro@citrix.com>
      CC: Zhenzhong Duan <zhenzhong.duan@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [v1: Fixed up per David Vrabel comments]
      Reviewed-by: NBen Guthro <benjamin.guthro@citrix.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      6efa20e4
  3. 15 7月, 2013 1 次提交
    • P
      x86: delete __cpuinit usage from all x86 files · 148f9bb8
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      Note that some harmless section mismatch warnings may result, since
      notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
      are flagged as __cpuinit  -- so if we remove the __cpuinit from
      arch specific callers, we will also get section mismatch warnings.
      As an intermediate step, we intend to turn the linux/init.h cpuinit
      content into no-ops as early as possible, since that will get rid
      of these warnings.  In any case, they are temporary and harmless.
      
      This removes all the arch/x86 uses of the __cpuinit macros from
      all C files.  x86 only had the one __CPUINIT used in assembly files,
      and it wasn't paired off with a .previous or a __FINIT, so we can
      delete it directly w/o any corresponding additional change there.
      
      [1] https://lkml.org/lkml/2013/5/20/589
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      148f9bb8
  4. 29 6月, 2013 3 次提交
    • D
      x86: xen: Sync the CMOS RTC as well as the Xen wallclock · 47433b8c
      David Vrabel 提交于
      Adjustments to Xen's persistent clock via update_persistent_clock()
      don't actually persist, as the Xen wallclock is a software only clock
      and modifications to it do not modify the underlying CMOS RTC.
      
      The x86_platform.set_wallclock hook is there to keep the hardware RTC
      synchronized. On a guest this is pointless.
      
      On Dom0 we can use the native implementaion which actually updates the
      hardware RTC, but we still need to keep the software emulation of RTC
      for the guests up to date. The subscription to the pvclock_notifier
      allows us to emulate this easily. The notifier is called at every tick
      and when the clock was set.
      
      Right now we only use that notifier when the clock was set, but due to
      the fact that it is called periodically from the timekeeping update
      code, we can utilize it to emulate the NTP driven drift compensation
      of update_persistant_clock() for the Xen wall (software) clock.
      
      Add a 11 minutes periodic update to the pvclock_gtod notifier callback
      to achieve that. The static variable 'next' which maintains that 11
      minutes update cycle is protected by the core code serialization so
      there is no need to add a Xen specific serialization mechanism.
      
      [ tglx: Massaged changelog and added a few comments ]
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: <xen-devel@lists.xen.org>
      Link: http://lkml.kernel.org/r/1372329348-20841-6-git-send-email-david.vrabel@citrix.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      47433b8c
    • D
      x86: xen: Sync the wallclock when the system time is set · 5584880e
      David Vrabel 提交于
      Currently the Xen wallclock is only updated every 11 minutes if NTP is
      synchronized to its clock source (using the sync_cmos_clock() work).
      If a guest is started before NTP is synchronized it may see an
      incorrect wallclock time.
      
      Use the pvclock_gtod notifier chain to receive a notification when the
      system time has changed and update the wallclock to match.
      
      This chain is called on every timer tick and we want to avoid an extra
      (expensive) hypercall on every tick.  Because dom0 has historically
      never provided a very accurate wallclock and guests do not expect one,
      we can do this simply: the wallclock is only updated if the clock was
      set.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: <xen-devel@lists.xen.org>
      Link: http://lkml.kernel.org/r/1372329348-20841-5-git-send-email-david.vrabel@citrix.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      5584880e
    • L
      xen/time: remove blocked time accounting from xen "clockchip" · 0b0c002c
      Laszlo Ersek 提交于
      ... because the "clock_event_device framework" already accounts for idle
      time through the "event_handler" function pointer in
      xen_timer_interrupt().
      
      The patch is intended as the completion of [1]. It should fix the double
      idle times seen in PV guests' /proc/stat [2]. It should be orthogonal to
      stolen time accounting (the removed code seems to be isolated).
      
      The approach may be completely misguided.
      
      [1] https://lkml.org/lkml/2011/10/6/10
      [2] http://lists.xensource.com/archives/html/xen-devel/2010-08/msg01068.html
      
      John took the time to retest this patch on top of v3.10 and reported:
      "idle time is correctly incremented for pv and hvm for the normal
      case, nohz=off and nohz=idle." so lets put this patch in.
      
      CC: stable@vger.kernel.org
      Signed-off-by: NLaszlo Ersek <lersek@redhat.com>
      Signed-off-by: NJohn Haxby <john.haxby@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      0b0c002c
  5. 10 6月, 2013 9 次提交
    • K
      xen/time: Free onlined per-cpu data structure if we want to online it again. · 09e99da7
      Konrad Rzeszutek Wilk 提交于
      If the per-cpu time data structure has been onlined already and
      we are trying to online it again, then free the previous copy
      before blindly over-writting it.
      
      A developer naturally should not call this function multiple times
      but just in case.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      09e99da7
    • K
      xen/time: Check that the per_cpu data structure has data before freeing. · a05e2c37
      Konrad Rzeszutek Wilk 提交于
      We don't check whether the per_cpu data structure has actually
      been freed in the past. This checks it and if it has been freed
      in the past then just continues on without double-freeing.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      a05e2c37
    • K
      xen/time: Don't leak interrupt name when offlining. · c9d76a24
      Konrad Rzeszutek Wilk 提交于
      When the user does:
          echo 0 > /sys/devices/system/cpu/cpu1/online
          echo 1 > /sys/devices/system/cpu/cpu1/online
      
      kmemleak reports:
      kmemleak: 7 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
      
      One of the leaks is from xen/time:
      
      unreferenced object 0xffff88003fa51280 (size 32):
        comm "swapper/0", pid 1, jiffies 4294667339 (age 1027.789s)
        hex dump (first 32 bytes):
          74 69 6d 65 72 31 00 00 00 00 00 00 00 00 00 00  timer1..........
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff81660721>] kmemleak_alloc+0x21/0x50
          [<ffffffff81190aac>] __kmalloc_track_caller+0xec/0x2a0
          [<ffffffff812fe1bb>] kvasprintf+0x5b/0x90
          [<ffffffff812fe228>] kasprintf+0x38/0x40
          [<ffffffff81041ec1>] xen_setup_timer+0x51/0xf0
          [<ffffffff8166339f>] xen_cpu_up+0x5f/0x3e8
          [<ffffffff8166bbf5>] _cpu_up+0xd1/0x14b
          [<ffffffff8166bd48>] cpu_up+0xd9/0xec
          [<ffffffff81ae6e4a>] smp_init+0x4b/0xa3
          [<ffffffff81ac4981>] kernel_init_freeable+0xdb/0x1e6
          [<ffffffff8165ce39>] kernel_init+0x9/0xf0
          [<ffffffff8167edfc>] ret_from_fork+0x7c/0xb0
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      This patch fixes it by stashing away the 'name' in the per-cpu
      data structure and freeing it when offlining the CPU.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      c9d76a24
    • K
      xen/time: Encapsulate the struct clock_event_device in another structure. · 31620a19
      Konrad Rzeszutek Wilk 提交于
      We don't do any code movement. We just encapsulate the struct clock_event_device
      in a new structure which contains said structure and a pointer to
      a char *name. The 'name' will be used in 'xen/time: Don't leak interrupt
      name when offlining'.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      31620a19
    • K
      xen/spinlock: Don't leak interrupt name when offlining. · 354e7b76
      Konrad Rzeszutek Wilk 提交于
      When the user does:
      echo 0 > /sys/devices/system/cpu/cpu1/online
      echo 1 > /sys/devices/system/cpu/cpu1/online
      
      kmemleak reports:
      kmemleak: 7 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
      
      unreferenced object 0xffff88003fa51260 (size 32):
        comm "swapper/0", pid 1, jiffies 4294667339 (age 1027.789s)
        hex dump (first 32 bytes):
          73 70 69 6e 6c 6f 63 6b 31 00 00 00 00 00 00 00  spinlock1.......
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff81660721>] kmemleak_alloc+0x21/0x50
          [<ffffffff81190aac>] __kmalloc_track_caller+0xec/0x2a0
          [<ffffffff812fe1bb>] kvasprintf+0x5b/0x90
          [<ffffffff812fe228>] kasprintf+0x38/0x40
          [<ffffffff81663789>] xen_init_lock_cpu+0x61/0xbe
          [<ffffffff816633a6>] xen_cpu_up+0x66/0x3e8
          [<ffffffff8166bbf5>] _cpu_up+0xd1/0x14b
          [<ffffffff8166bd48>] cpu_up+0xd9/0xec
          [<ffffffff81ae6e4a>] smp_init+0x4b/0xa3
          [<ffffffff81ac4981>] kernel_init_freeable+0xdb/0x1e6
          [<ffffffff8165ce39>] kernel_init+0x9/0xf0
          [<ffffffff8167edfc>] ret_from_fork+0x7c/0xb0
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      Instead of doing it like the "xen/smp: Don't leak interrupt name when offlining"
      patch did (which has a per-cpu structure which contains both the
      IRQ number and char*) we use a per-cpu pointers to a *char.
      
      The reason is that the "__this_cpu_read(lock_kicker_irq);" macro
      blows up with "__bad_size_call_parameter()" as the size of the
      returned structure is not within the parameters of what it expects
      and optimizes for.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      354e7b76
    • K
      xen/smp: Don't leak interrupt name when offlining. · b85fffec
      Konrad Rzeszutek Wilk 提交于
      When the user does:
      echo 0 > /sys/devices/system/cpu/cpu1/online
      echo 1 > /sys/devices/system/cpu/cpu1/online
      
      kmemleak reports:
      kmemleak: 7 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
      
      unreferenced object 0xffff88003fa51240 (size 32):
        comm "swapper/0", pid 1, jiffies 4294667339 (age 1027.789s)
        hex dump (first 32 bytes):
          72 65 73 63 68 65 64 31 00 00 00 00 00 00 00 00  resched1........
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff81660721>] kmemleak_alloc+0x21/0x50
          [<ffffffff81190aac>] __kmalloc_track_caller+0xec/0x2a0
          [<ffffffff812fe1bb>] kvasprintf+0x5b/0x90
          [<ffffffff812fe228>] kasprintf+0x38/0x40
          [<ffffffff81047ed1>] xen_smp_intr_init+0x41/0x2c0
          [<ffffffff816636d3>] xen_cpu_up+0x393/0x3e8
          [<ffffffff8166bbf5>] _cpu_up+0xd1/0x14b
          [<ffffffff8166bd48>] cpu_up+0xd9/0xec
          [<ffffffff81ae6e4a>] smp_init+0x4b/0xa3
          [<ffffffff81ac4981>] kernel_init_freeable+0xdb/0x1e6
          [<ffffffff8165ce39>] kernel_init+0x9/0xf0
          [<ffffffff8167edfc>] ret_from_fork+0x7c/0xb0
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      This patch fixes some of it by using the 'struct xen_common_irq->name'
      field to stash away the char so that it can be freed when
      the interrupt line is destroyed.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      b85fffec
    • K
      xen/smp: Set the per-cpu IRQ number to a valid default. · ee336e10
      Konrad Rzeszutek Wilk 提交于
      When we free it we want to make sure to set it to a default
      value of -1 so that we don't double-free it (in case somebody
      calls us twice).
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ee336e10
    • K
      xen/smp: Introduce a common structure to contain the IRQ name and interrupt line. · 9547689f
      Konrad Rzeszutek Wilk 提交于
      This patch adds a new structure to contain the common two things
      that each of the per-cpu interrupts need:
       - an interrupt number,
       - and the name of the interrupt (to be added in 'xen/smp: Don't leak
         interrupt name when offlining').
      
      This allows us to carry the tuple of the per-cpu interrupt data structure
      and expand it as we need in the future.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      9547689f
    • K
      xen/smp: Coalesce the free_irq calls in one function. · 53b94fdc
      Konrad Rzeszutek Wilk 提交于
      There are two functions that do a bunch of 'free_irq' on
      the per_cpu IRQ. Instead of having duplicate code just move
      it to one function.
      
      This is just code movement.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      53b94fdc
  6. 07 6月, 2013 1 次提交
  7. 04 6月, 2013 1 次提交
    • K
      xen/smp: Fixup NOHZ per cpu data when onlining an offline CPU. · 466318a8
      Konrad Rzeszutek Wilk 提交于
      The xen_play_dead is an undead function. When the vCPU is told to
      offline it ends up calling xen_play_dead wherin it calls the
      VCPUOP_down hypercall which offlines the vCPU. However, when the
      vCPU is onlined back, it resumes execution right after
      VCPUOP_down hypercall.
      
      That was OK (albeit the API for play_dead assumes that the CPU
      stays dead and never returns) but with commit 4b0c0f29
      (tick: Cleanup NOHZ per cpu data on cpu down) that is no longer safe
      as said commit resets the ts->inidle which at the start of the
      cpu_idle loop was set.
      
      The net effect is that we get this warn:
      
      Broke affinity for irq 16
      installing Xen timer for CPU 1
      cpu 1 spinlock event irq 48
      ------------[ cut here ]------------
      WARNING: at /home/konrad/linux-linus/kernel/time/tick-sched.c:935 tick_nohz_idle_exit+0x195/0x1b0()
      Modules linked in: dm_multipath dm_mod xen_evtchn iscsi_boot_sysfs
      CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.10.0-rc3upstream-00068-gdcdbe33a #1
      Hardware name: BIOSTAR Group N61PB-M2S/N61PB-M2S, BIOS 6.00 PG 09/03/2009
       ffffffff8193b448 ffff880039da5e60 ffffffff816707c8 ffff880039da5ea0
       ffffffff8108ce8b ffff880039da4010 ffff88003fa8e500 ffff880039da4010
       0000000000000001 ffff880039da4000 ffff880039da4010 ffff880039da5eb0
      Call Trace:
       [<ffffffff816707c8>] dump_stack+0x19/0x1b
       [<ffffffff8108ce8b>] warn_slowpath_common+0x6b/0xa0
       [<ffffffff8108ced5>] warn_slowpath_null+0x15/0x20
       [<ffffffff810e4745>] tick_nohz_idle_exit+0x195/0x1b0
       [<ffffffff810da755>] cpu_startup_entry+0x205/0x250
       [<ffffffff81661070>] cpu_bringup_and_idle+0x13/0x15
      ---[ end trace 915c8c486004dda1 ]---
      
      b/c ts_inidle is set to zero. Thomas suggested that we just add a workaround
      to call tick_nohz_idle_enter before returning from xen_play_dead() - and
      that is what this patch does and fixes the issue.
      
      We also add the stable part b/c git commit 4b0c0f29 is on the stable
      tree.
      
      CC: stable@vger.kernel.org
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      466318a8
  8. 29 5月, 2013 2 次提交
    • S
      xen: Clean up apic ipi interface · 1db01b49
      Stefan Bader 提交于
      Commit f447d56d introduced the
      implementation of the PV apic ipi interface. But there were some
      odd things (it seems none of which cause really any issue but
      maybe they should be cleaned up anyway):
       - xen_send_IPI_mask_allbutself (and by that xen_send_IPI_allbutself)
         ignore the passed in vector and only use the CALL_FUNCTION_SINGLE
         vector. While xen_send_IPI_all and xen_send_IPI_mask use the vector.
       - physflat_send_IPI_allbutself is declared unnecessarily. It is never
         used.
      
      This patch tries to clean up those things.
      Signed-off-by: NStefan Bader <stefan.bader@canonical.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      1db01b49
    • D
      x86: Increase precision of x86_platform.get/set_wallclock() · 3565184e
      David Vrabel 提交于
      All the virtualized platforms (KVM, lguest and Xen) have persistent
      wallclocks that have more than one second of precision.
      
      read_persistent_wallclock() and update_persistent_wallclock() allow
      for nanosecond precision but their implementation on x86 with
      x86_platform.get/set_wallclock() only allows for one second precision.
      This means guests may see a wallclock time that is off by up to 1
      second.
      
      Make set_wallclock() and get_wallclock() take a struct timespec
      parameter (which allows for nanosecond precision) so KVM and Xen
      guests may start with a more accurate wallclock time and a Xen dom0
      can maintain a more accurate wallclock for guests.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      3565184e
  9. 08 5月, 2013 3 次提交
  10. 07 5月, 2013 1 次提交
  11. 06 5月, 2013 1 次提交
    • K
      xen/vcpu/pvhvm: Fix vcpu hotplugging hanging. · 7f1fc268
      Konrad Rzeszutek Wilk 提交于
      If a user did:
      
      	echo 0 > /sys/devices/system/cpu/cpu1/online
      	echo 1 > /sys/devices/system/cpu/cpu1/online
      
      we would (this a build with DEBUG enabled) get to:
      smpboot: ++++++++++++++++++++=_---CPU UP  1
      .. snip..
      smpboot: Stack at about ffff880074c0ff44
      smpboot: CPU1: has booted.
      
      and hang. The RCU mechanism would kick in an try to IPI the CPU1
      but the IPIs (and all other interrupts) would never arrive at the
      CPU1. At first glance at least. A bit digging in the hypervisor
      trace shows that (using xenanalyze):
      
      [vla] d4v1 vec 243 injecting
         0.043163027 --|x d4v1 intr_window vec 243 src 5(vector) intr f3
      ]  0.043163639 --|x d4v1 vmentry cycles 1468
      ]  0.043164913 --|x d4v1 vmexit exit_reason PENDING_INTERRUPT eip ffffffff81673254
         0.043164913 --|x d4v1 inj_virq vec 243  real
        [vla] d4v1 vec 243 injecting
         0.043164913 --|x d4v1 intr_window vec 243 src 5(vector) intr f3
      ]  0.043165526 --|x d4v1 vmentry cycles 1472
      ]  0.043166800 --|x d4v1 vmexit exit_reason PENDING_INTERRUPT eip ffffffff81673254
         0.043166800 --|x d4v1 inj_virq vec 243  real
        [vla] d4v1 vec 243 injecting
      
      there is a pending event (subsequent debugging shows it is the IPI
      from the VCPU0 when smpboot.c on VCPU1 has done
      "set_cpu_online(smp_processor_id(), true)") and the guest VCPU1 is
      interrupted with the callback IPI (0xf3 aka 243) which ends up calling
      __xen_evtchn_do_upcall.
      
      The __xen_evtchn_do_upcall seems to do *something* but not acknowledge
      the pending events. And the moment the guest does a 'cli' (that is the
      ffffffff81673254 in the log above) the hypervisor is invoked again to
      inject the IPI (0xf3) to tell the guest it has pending interrupts.
      This repeats itself forever.
      
      The culprit was the per_cpu(xen_vcpu, cpu) pointer. At the bootup
      we set each per_cpu(xen_vcpu, cpu) to point to the
      shared_info->vcpu_info[vcpu] but later on use the VCPUOP_register_vcpu_info
      to register per-CPU  structures (xen_vcpu_setup).
      This is used to allow events for more than 32 VCPUs and for performance
      optimizations reasons.
      
      When the user performs the VCPU hotplug we end up calling the
      the xen_vcpu_setup once more. We make the hypercall which returns
      -EINVAL as it does not allow multiple registration calls (and
      already has re-assigned where the events are being set). We pick
      the fallback case and set per_cpu(xen_vcpu, cpu) to point to the
      shared_info->vcpu_info[vcpu] (which is a good fallback during bootup).
      However the hypervisor is still setting events in the register
      per-cpu structure (per_cpu(xen_vcpu_info, cpu)).
      
      As such when the events are set by the hypervisor (such as timer one),
      and when we iterate in __xen_evtchn_do_upcall we end up reading stale
      events from the shared_info->vcpu_info[vcpu] instead of the
      per_cpu(xen_vcpu_info, cpu) structures. Hence we never acknowledge the
      events that the hypervisor has set and the hypervisor keeps on reminding
      us to ack the events which we never do.
      
      The fix is simple. Don't on the second time when xen_vcpu_setup is
      called over-write the per_cpu(xen_vcpu, cpu) if it points to
      per_cpu(xen_vcpu_info).
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      CC: stable@vger.kernel.org
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      7f1fc268
  12. 17 4月, 2013 9 次提交
    • K
      xen/smp: Unifiy some of the PVs and PVHVM offline CPU path · b12abaa1
      Konrad Rzeszutek Wilk 提交于
      The "xen_cpu_die" and "xen_hvm_cpu_die" are very similar.
      Lets coalesce them.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      b12abaa1
    • K
      xen/smp/pvhvm: Don't initialize IRQ_WORKER as we are using the native one. · 27d8b207
      Konrad Rzeszutek Wilk 提交于
      There is no need to use the PV version of the IRQ_WORKER mechanism
      as under PVHVM we are using the native version. The native
      version is using the SMP API.
      
      They just sit around unused:
      
        69:          0          0  xen-percpu-ipi       irqwork0
        83:          0          0  xen-percpu-ipi       irqwork1
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      27d8b207
    • K
      xen/spinlock: Disable IRQ spinlock (PV) allocation on PVHVM · 70dd4998
      Konrad Rzeszutek Wilk 提交于
      See git commit f10cd522
      (xen: disable PV spinlocks on HVM) for details.
      
      But we did not disable it everywhere - which means that when
      we boot as PVHVM we end up allocating per-CPU irq line for
      spinlock. This fixes that.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      70dd4998
    • K
      xen/spinlock: Check against default value of -1 for IRQ line. · cb9c6f15
      Konrad Rzeszutek Wilk 提交于
      The default (uninitialized) value of the IRQ line is -1.
      Check if we already have allocated an spinlock interrupt line
      and if somebody is trying to do it again. Also set it to -1
      when we offline the CPU.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      cb9c6f15
    • K
      xen/time: Add default value of -1 for IRQ and check for that. · ef35a4e6
      Konrad Rzeszutek Wilk 提交于
      If the timer interrupt has been de-init or is just now being
      initialized, the default value of -1 should be preset as
      interrupt line. Check for that and if something is odd
      WARN us.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ef35a4e6
    • K
      xen/time: Fix kasprintf splat when allocating timer%d IRQ line. · 7918c92a
      Konrad Rzeszutek Wilk 提交于
      When we online the CPU, we get this splat:
      
      smpboot: Booting Node 0 Processor 1 APIC 0x2
      installing Xen timer for CPU 1
      BUG: sleeping function called from invalid context at /home/konrad/ssd/konrad/linux/mm/slab.c:3179
      in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/1
      Pid: 0, comm: swapper/1 Not tainted 3.9.0-rc6upstream-00001-g3884fad #1
      Call Trace:
       [<ffffffff810c1fea>] __might_sleep+0xda/0x100
       [<ffffffff81194617>] __kmalloc_track_caller+0x1e7/0x2c0
       [<ffffffff81303758>] ? kasprintf+0x38/0x40
       [<ffffffff813036eb>] kvasprintf+0x5b/0x90
       [<ffffffff81303758>] kasprintf+0x38/0x40
       [<ffffffff81044510>] xen_setup_timer+0x30/0xb0
       [<ffffffff810445af>] xen_hvm_setup_cpu_clockevents+0x1f/0x30
       [<ffffffff81666d0a>] start_secondary+0x19c/0x1a8
      
      The solution to that is use kasprintf in the CPU hotplug path
      that 'online's the CPU. That is, do it in in xen_hvm_cpu_notify,
      and remove the call to in xen_hvm_setup_cpu_clockevents.
      
      Unfortunatly the later is not a good idea as the bootup path
      does not use xen_hvm_cpu_notify so we would end up never allocating
      timer%d interrupt lines when booting. As such add the check for
      atomic() to continue.
      
      CC: stable@vger.kernel.org
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      7918c92a
    • K
      xen/smp/spinlock: Fix leakage of the spinlock interrupt line for every CPU online/offline · 66ff0fe9
      Konrad Rzeszutek Wilk 提交于
      While we don't use the spinlock interrupt line (see for details
      commit f10cd522 -
      xen: disable PV spinlocks on HVM) - we should still do the proper
      init / deinit sequence. We did not do that correctly and for the
      CPU init for PVHVM guest we would allocate an interrupt line - but
      failed to deallocate the old interrupt line.
      
      This resulted in leakage of an irq_desc but more importantly this splat
      as we online an offlined CPU:
      
      genirq: Flags mismatch irq 71. 0002cc20 (spinlock1) vs. 0002cc20 (spinlock1)
      Pid: 2542, comm: init.late Not tainted 3.9.0-rc6upstream #1
      Call Trace:
       [<ffffffff811156de>] __setup_irq+0x23e/0x4a0
       [<ffffffff81194191>] ? kmem_cache_alloc_trace+0x221/0x250
       [<ffffffff811161bb>] request_threaded_irq+0xfb/0x160
       [<ffffffff8104c6f0>] ? xen_spin_trylock+0x20/0x20
       [<ffffffff813a8423>] bind_ipi_to_irqhandler+0xa3/0x160
       [<ffffffff81303758>] ? kasprintf+0x38/0x40
       [<ffffffff8104c6f0>] ? xen_spin_trylock+0x20/0x20
       [<ffffffff810cad35>] ? update_max_interval+0x15/0x40
       [<ffffffff816605db>] xen_init_lock_cpu+0x3c/0x78
       [<ffffffff81660029>] xen_hvm_cpu_notify+0x29/0x33
       [<ffffffff81676bdd>] notifier_call_chain+0x4d/0x70
       [<ffffffff810bb2a9>] __raw_notifier_call_chain+0x9/0x10
       [<ffffffff8109402b>] __cpu_notify+0x1b/0x30
       [<ffffffff8166834a>] _cpu_up+0xa0/0x14b
       [<ffffffff816684ce>] cpu_up+0xd9/0xec
       [<ffffffff8165f754>] store_online+0x94/0xd0
       [<ffffffff8141d15b>] dev_attr_store+0x1b/0x20
       [<ffffffff81218f44>] sysfs_write_file+0xf4/0x170
       [<ffffffff811a2864>] vfs_write+0xb4/0x130
       [<ffffffff811a302a>] sys_write+0x5a/0xa0
       [<ffffffff8167ada9>] system_call_fastpath+0x16/0x1b
      cpu 1 spinlock event irq -16
      smpboot: Booting Node 0 Processor 1 APIC 0x2
      
      And if one looks at the /proc/interrupts right after
      offlining (CPU1):
      
        70:          0          0  xen-percpu-ipi       spinlock0
        71:          0          0  xen-percpu-ipi       spinlock1
        77:          0          0  xen-percpu-ipi       spinlock2
      
      There is the oddity of the 'spinlock1' still being present.
      
      CC: stable@vger.kernel.org
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      66ff0fe9
    • K
      xen/smp: Fix leakage of timer interrupt line for every CPU online/offline. · 888b65b4
      Konrad Rzeszutek Wilk 提交于
      In the PVHVM path when we do CPU online/offline path we would
      leak the timer%d IRQ line everytime we do a offline event. The
      online path (xen_hvm_setup_cpu_clockevents via
      x86_cpuinit.setup_percpu_clockev) would allocate a new interrupt
      line for the timer%d.
      
      But we would still use the old interrupt line leading to:
      
      kernel BUG at /home/konrad/ssd/konrad/linux/kernel/hrtimer.c:1261!
      invalid opcode: 0000 [#1] SMP
      RIP: 0010:[<ffffffff810b9e21>]  [<ffffffff810b9e21>] hrtimer_interrupt+0x261/0x270
      .. snip..
       <IRQ>
       [<ffffffff810445ef>] xen_timer_interrupt+0x2f/0x1b0
       [<ffffffff81104825>] ? stop_machine_cpu_stop+0xb5/0xf0
       [<ffffffff8111434c>] handle_irq_event_percpu+0x7c/0x240
       [<ffffffff811175b9>] handle_percpu_irq+0x49/0x70
       [<ffffffff813a74a3>] __xen_evtchn_do_upcall+0x1c3/0x2f0
       [<ffffffff813a760a>] xen_evtchn_do_upcall+0x2a/0x40
       [<ffffffff8167c26d>] xen_hvm_callback_vector+0x6d/0x80
       <EOI>
       [<ffffffff81666d01>] ? start_secondary+0x193/0x1a8
       [<ffffffff81666cfd>] ? start_secondary+0x18f/0x1a8
      
      There is also the oddity (timer1) in the /proc/interrupts after
      offlining CPU1:
      
        64:       1121          0  xen-percpu-virq      timer0
        78:          0          0  xen-percpu-virq      timer1
        84:          0       2483  xen-percpu-virq      timer2
      
      This patch fixes it.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      CC: stable@vger.kernel.org
      888b65b4
    • D
      x86/xen: populate boot_params with EDD data · 96f28bc6
      David Vrabel 提交于
      During early setup of a dom0 kernel, populate boot_params with the
      Enhanced Disk Drive (EDD) and MBR signature data.  This makes
      information on the BIOS boot device available in /sys/firmware/edd/.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: NJan Beulich <jbeulich@suse.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      96f28bc6
  13. 12 4月, 2013 2 次提交
  14. 11 4月, 2013 1 次提交
  15. 08 4月, 2013 1 次提交
  16. 03 4月, 2013 1 次提交
    • K
      xen/mmu: On early bootup, flush the TLB when changing RO->RW bits Xen provided pagetables. · b2222794
      Konrad Rzeszutek Wilk 提交于
      Occassionaly on a DL380 G4 the guest would crash quite early with this:
      
      (XEN) d244:v0: unhandled page fault (ec=0003)
      (XEN) Pagetable walk from ffffffff84dc7000:
      (XEN)  L4[0x1ff] = 00000000c3f18067 0000000000001789
      (XEN)  L3[0x1fe] = 00000000c3f14067 000000000000178d
      (XEN)  L2[0x026] = 00000000dc8b2067 0000000000004def
      (XEN)  L1[0x1c7] = 00100000dc8da067 0000000000004dc7
      (XEN) domain_crash_sync called from entry.S
      (XEN) Domain 244 (vcpu#0) crashed on cpu#3:
      (XEN) ----[ Xen-4.1.3OVM  x86_64  debug=n  Not tainted ]----
      (XEN) CPU:    3
      (XEN) RIP:    e033:[<ffffffff81263f22>]
      (XEN) RFLAGS: 0000000000000216   EM: 1   CONTEXT: pv guest
      (XEN) rax: 0000000000000000   rbx: ffffffff81785f88   rcx: 000000000000003f
      (XEN) rdx: 0000000000000000   rsi: 00000000dc8da063   rdi: ffffffff84dc7000
      
      The offending code shows it to be a loop writting the value zero
      (%rax) in the %rdi (the L4 provided by Xen) register:
      
         0: 44 00 00             add    %r8b,(%rax)
         3: 31 c0                 xor    %eax,%eax
         5: b9 40 00 00 00       mov    $0x40,%ecx
         a: 66 0f 1f 84 00 00 00 nopw   0x0(%rax,%rax,1)
        11: 00 00
        13: ff c9                 dec    %ecx
        15:* 48 89 07             mov    %rax,(%rdi)     <-- trapping instruction
        18: 48 89 47 08           mov    %rax,0x8(%rdi)
        1c: 48 89 47 10           mov    %rax,0x10(%rdi)
      
      which fails. xen_setup_kernel_pagetable recycles some of the Xen's
      page-table entries when it has switched over to its Linux page-tables.
      
      Right before try to clear the page, we  make a hypercall to change
      it from _RO to  _RW and that works (otherwise we would hit an BUG()).
      And the _RW flag is set for that page:
      (XEN)  L1[0x1c7] = 001000004885f067 0000000000004dc7
      
      The error code is 3, so PFEC_page_present and PFEC_write_access, so page is
      present (correct), and we tried to write to the page, but a violation
      occurred. The one theory is that the the page entries in hardware
      (which are cached) are not up to date with what we just set. Especially
      as we have just done an CR3 write and flushed the multicalls.
      
      This patch does solve the problem by flusing out the TLB page
      entry after changing it from _RO to _RW and we don't hit this
      issue anymore.
      
      Fixed-Oracle-Bug: 16243091 [ON OCCASIONS VM START GOES INTO
      'CRASH' STATE: CLEAR_PAGE+0X12 ON HP DL380 G4]
      Reported-and-Tested-by: NSaar Maoz <Saar.Maoz@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      b2222794
  17. 28 3月, 2013 1 次提交