1. 10 10月, 2013 1 次提交
    • F
      xen: Fix possible user space selector corruption · 7cde9b27
      Frediano Ziglio 提交于
      Due to the way kernel is initialized under Xen is possible that the
      ring1 selector used by the kernel for the boot cpu end up to be copied
      to userspace leading to segmentation fault in the userspace.
      
      Xen code in the kernel initialize no-boot cpus with correct selectors (ds
      and es set to __USER_DS) but the boot one keep the ring1 (passed by Xen).
      On task context switch (switch_to) we assume that ds, es and cs already
      point to __USER_DS and __KERNEL_CSso these selector are not changed.
      
      If processor is an Intel that support sysenter instruction sysenter/sysexit
      is used so ds and es are not restored switching back from kernel to
      userspace. In the case the selectors point to a ring1 instead of __USER_DS
      the userspace code will crash on first memory access attempt (to be
      precise Xen on the emulated iret used to do sysexit will detect and set ds
      and es to zero which lead to GPF anyway).
      
      Now if an userspace process call kernel using sysenter and get rescheduled
      (for me it happen on a specific init calling wait4) could happen that the
      ring1 selector is set to ds and es.
      
      This is quite hard to detect cause after a while these selectors are fixed
      (__USER_DS seems sticky).
      
      Bisecting the code commit 7076aada appears
      to be the first one that have this issue.
      Signed-off-by: NFrediano Ziglio <frediano.ziglio@citrix.com>
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Reviewed-by: NAndrew Cooper <andrew.cooper3@citrix.com>
      7cde9b27
  2. 25 9月, 2013 2 次提交
    • D
      xen/p2m: check MFN is in range before using the m2p table · 0160676b
      David Vrabel 提交于
      On hosts with more than 168 GB of memory, a 32-bit guest may attempt
      to grant map an MFN that is error cannot lookup in its mapping of the
      m2p table.  There is an m2p lookup as part of m2p_add_override() and
      m2p_remove_override().  The lookup falls off the end of the mapped
      portion of the m2p and (because the mapping is at the highest virtual
      address) wraps around and the lookup causes a fault on what appears to
      be a user space address.
      
      do_page_fault() (thinking it's a fault to a userspace address), tries
      to lock mm->mmap_sem.  If the gntdev device is used for the grant map,
      m2p_add_override() is called from from gnttab_mmap() with mm->mmap_sem
      already locked.  do_page_fault() then deadlocks.
      
      The deadlock would most commonly occur when a 64-bit guest is started
      and xenconsoled attempts to grant map its console ring.
      
      Introduce mfn_to_pfn_no_overrides() which checks the MFN is within the
      mapped portion of the m2p table before accessing the table and use
      this in m2p_add_override(), m2p_remove_override(), and mfn_to_pfn()
      (which already had the correct range check).
      
      All faults caused by accessing the non-existant parts of the m2p are
      thus within the kernel address space and exception_fixup() is called
      without trying to lock mm->mmap_sem.
      
      This means that for MFNs that are outside the mapped range of the m2p
      then mfn_to_pfn() will always look in the m2p overrides.  This is
      correct because it must be a foreign MFN (and the PFN in the m2p in
      this case is only relevant for the other domain).
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Cc: Stefano Stabellini <stefano.stabellini@citrix.com>
      Cc: Jan Beulich <JBeulich@suse.com>
      --
      v3: check for auto_translated_physmap in mfn_to_pfn_no_overrides()
      v2: in mfn_to_pfn() look in m2p_overrides if the MFN is out of
          range as it's probably foreign.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      0160676b
    • K
      xen: Do not enable spinlocks before jump_label_init() has executed · a945928e
      Konrad Rzeszutek Wilk 提交于
      xen_init_spinlocks() currently calls static_key_slow_inc() before
      jump_label_init() is invoked. When CONFIG_JUMP_LABEL is set (which usually is
      the case) the effect of this static_key_slow_inc() is deferred until after
      jump_label_init(). This is different from when CONFIG_JUMP_LABEL is not set, in
      which case the key is set immediately. Thus, depending on the value of config
      option, we may observe different behavior.
      
      In addition, when we come to __jump_label_transform() from jump_label_init(),
      the key (paravirt_ticketlocks_enabled) is already enabled. On processors where
      ideal_nop is not the same as default_nop this will cause a BUG() since it is
      expected that before a key is enabled the latter is replaced by the former
      during initialization.
      
      To address this problem we need to move
      static_key_slow_inc(&paravirt_ticketlocks_enabled) so that it is called
      after jump_label_init(). We also need to make sure that this is done before
      other cpus start to boot. early_initcall appears to be  a good place to do so.
      (Note that we cannot move whole xen_init_spinlocks() there since pv_lock_ops
      need to be set before alternative_instructions() runs.)
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [v2: Added extra comments in the code]
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      a945928e
  3. 10 9月, 2013 6 次提交
  4. 09 9月, 2013 1 次提交
  5. 22 8月, 2013 1 次提交
  6. 21 8月, 2013 1 次提交
    • V
      xen/pvhvm: Initialize xen panic handler for PVHVM guests · 669b0ae9
      Vaughan Cao 提交于
      kernel use callback linked in panic_notifier_list to notice others when panic
      happens.
      NORET_TYPE void panic(const char * fmt, ...){
          ...
          atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
      }
      When Xen becomes aware of this, it will call xen_reboot(SHUTDOWN_crash) to
      send out an event with reason code - SHUTDOWN_crash.
      
      xen_panic_handler_init() is defined to register on panic_notifier_list but
      we only call it in xen_arch_setup which only be called by PV, this patch is
      necessary for PVHVM.
      
      Without this patch, setting 'on_crash=coredump-restart' in PVHVM guest config
      file won't lead a vmcore to be generate when the guest panics. It can be
      reproduced with 'echo c > /proc/sysrq-trigger'.
      Signed-off-by: NVaughan Cao <vaughan.cao@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: NJoe Jin <joe.jin@oracle.com>
      669b0ae9
  7. 20 8月, 2013 5 次提交
    • S
      xen/m2p: use GNTTABOP_unmap_and_replace to reinstate the original mapping · ee072640
      Stefano Stabellini 提交于
      GNTTABOP_unmap_grant_ref unmaps a grant and replaces it with a 0
      mapping instead of reinstating the original mapping.
      Doing so separately would be racy.
      
      To unmap a grant and reinstate the original mapping atomically we use
      GNTTABOP_unmap_and_replace.
      GNTTABOP_unmap_and_replace doesn't work with GNTMAP_contains_pte, so
      don't use it for kmaps.  GNTTABOP_unmap_and_replace zeroes the mapping
      passed in new_addr so we have to reinstate it, however that is a
      per-cpu mapping only used for balloon scratch pages, so we can be sure that
      it's not going to be accessed while the mapping is not valid.
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      CC: alex@alex.org.uk
      CC: dcrisan@flexiant.com
      
      [v1: Konrad fixed up the conflicts]
      Conflicts:
      	arch/x86/xen/p2m.c
      ee072640
    • C
      xen/smp: initialize IPI vectors before marking CPU online · fc78d343
      Chuck Anderson 提交于
      An older PVHVM guest (v3.0 based) crashed during vCPU hot-plug with:
      
      	kernel BUG at drivers/xen/events.c:1328!
      
      RCU has detected that a CPU has not entered a quiescent state within the
      grace period.  It needs to send the CPU a reschedule IPI if it is not
      offline.  rcu_implicit_offline_qs() does this check:
      
      	/*
      	 * If the CPU is offline, it is in a quiescent state.  We can
      	 * trust its state not to change because interrupts are disabled.
      	 */
      	if (cpu_is_offline(rdp->cpu)) {
      		rdp->offline_fqs++;
      		return 1;
      	}
      
      	Else the CPU is online.  Send it a reschedule IPI.
      
      The CPU is in the middle of being hot-plugged and has been marked online
      (!cpu_is_offline()).  See start_secondary():
      
      	set_cpu_online(smp_processor_id(), true);
      	...
      	per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
      
      start_secondary() then waits for the CPU bringing up the hot-plugged CPU to
      mark it as active:
      
      	/*
      	 * Wait until the cpu which brought this one up marked it
      	 * online before enabling interrupts. If we don't do that then
      	 * we can end up waking up the softirq thread before this cpu
      	 * reached the active state, which makes the scheduler unhappy
      	 * and schedule the softirq thread on the wrong cpu. This is
      	 * only observable with forced threaded interrupts, but in
      	 * theory it could also happen w/o them. It's just way harder
      	 * to achieve.
      	 */
      	while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask))
      		cpu_relax();
      
      	/* enable local interrupts */
      	local_irq_enable();
      
      The CPU being hot-plugged will be marked active after it has been fully
      initialized by the CPU managing the hot-plug.  In the Xen PVHVM case
      xen_smp_intr_init() is called to set up the hot-plugged vCPU's
      XEN_RESCHEDULE_VECTOR.
      
      The hot-plugging CPU is marked online, not marked active and does not have
      its IPI vectors set up.  rcu_implicit_offline_qs() sees the hot-plugging
      cpu is !cpu_is_offline() and tries to send it a reschedule IPI:
      This will lead to:
      
      	kernel BUG at drivers/xen/events.c:1328!
      
      	xen_send_IPI_one()
      	xen_smp_send_reschedule()
      	rcu_implicit_offline_qs()
      	rcu_implicit_dynticks_qs()
      	force_qs_rnp()
      	force_quiescent_state()
      	__rcu_process_callbacks()
      	rcu_process_callbacks()
      	__do_softirq()
      	call_softirq()
      	do_softirq()
      	irq_exit()
      	xen_evtchn_do_upcall()
      
      because xen_send_IPI_one() will attempt to use an uninitialized IRQ for
      the XEN_RESCHEDULE_VECTOR.
      
      There is at least one other place that has caused the same crash:
      
      	xen_smp_send_reschedule()
      	wake_up_idle_cpu()
      	add_timer_on()
      	clocksource_watchdog()
      	call_timer_fn()
      	run_timer_softirq()
      	__do_softirq()
      	call_softirq()
      	do_softirq()
      	irq_exit()
      	xen_evtchn_do_upcall()
      	xen_hvm_callback_vector()
      
      clocksource_watchdog() uses cpu_online_mask to pick the next CPU to handle
      a watchdog timer:
      
      	/*
      	 * Cycle through CPUs to check if the CPUs stay synchronized
      	 * to each other.
      	 */
      	next_cpu = cpumask_next(raw_smp_processor_id(), cpu_online_mask);
      	if (next_cpu >= nr_cpu_ids)
      		next_cpu = cpumask_first(cpu_online_mask);
      	watchdog_timer.expires += WATCHDOG_INTERVAL;
      	add_timer_on(&watchdog_timer, next_cpu);
      
      This resulted in an attempt to send an IPI to a hot-plugging CPU that
      had not initialized its reschedule vector. One option would be to make
      the RCU code check to not check for CPU offline but for CPU active.
      As becoming active is done after a CPU is online (in older kernels).
      
      But Srivatsa pointed out that "the cpu_active vs cpu_online ordering has been
      completely reworked - in the online path, cpu_active is set *before* cpu_online,
      and also, in the cpu offline path, the cpu_active bit is reset in the CPU_DYING
      notification instead of CPU_DOWN_PREPARE." Drilling in this the bring-up
      path: "[brought up CPU].. send out a CPU_STARTING notification, and in response
      to that, the scheduler sets the CPU in the cpu_active_mask. Again, this mask
      is better left to the scheduler alone, since it has the intelligence to use it
      judiciously."
      
      The conclusion was that:
      "
      1. At the IPI sender side:
      
         It is incorrect to send an IPI to an offline CPU (cpu not present in
         the cpu_online_mask). There are numerous places where we check this
         and warn/complain.
      
      2. At the IPI receiver side:
      
         It is incorrect to let the world know of our presence (by setting
         ourselves in global bitmasks) until our initialization steps are complete
         to such an extent that we can handle the consequences (such as
         receiving interrupts without crashing the sender etc.)
      " (from Srivatsa)
      
      As the native code enables the interrupts at some point we need to be
      able to service them. In other words a CPU must have valid IPI vectors
      if it has been marked online.
      
      It doesn't need to handle the IPI (interrupts may be disabled) but needs
      to have valid IPI vectors because another CPU may find it in cpu_online_mask
      and attempt to send it an IPI.
      
      This patch will change the order of the Xen vCPU bring-up functions so that
      Xen vectors have been set up before start_secondary() is called.
      It also will not continue to bring up a Xen vCPU if xen_smp_intr_init() fails
      to initialize it.
      
      Orabug 13823853
      Signed-off-by Chuck Anderson <chuck.anderson@oracle.com>
      Acked-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      fc78d343
    • D
      x86/xen: during early setup, only 1:1 map the ISA region · e201bfcc
      David Vrabel 提交于
      During early setup, when the reserved regions and MMIO holes are being
      setup as 1:1 in the p2m, clear any mappings instead of making them 1:1
      (execept for the ISA region which is expected to be mapped).
      
      This fixes a regression introduced in 3.5 by 83d51ab4 (xen/setup:
      update VA mapping when releasing memory during setup) which caused
      hosts with tboot to fail to boot.
      
      tboot marks a region in the e820 map as unusable and the dom0 kernel
      would attempt to map this region and Xen does not permit unusable
      regions to be mapped by guests.
      
      (XEN)  0000000000000000 - 0000000000060000 (usable)
      (XEN)  0000000000060000 - 0000000000068000 (reserved)
      (XEN)  0000000000068000 - 000000000009e000 (usable)
      (XEN)  0000000000100000 - 0000000000800000 (usable)
      (XEN)  0000000000800000 - 0000000000972000 (unusable)
      
      tboot marked this region as unusable.
      
      (XEN)  0000000000972000 - 00000000cf200000 (usable)
      (XEN)  00000000cf200000 - 00000000cf38f000 (reserved)
      (XEN)  00000000cf38f000 - 00000000cf3ce000 (ACPI data)
      (XEN)  00000000cf3ce000 - 00000000d0000000 (reserved)
      (XEN)  00000000e0000000 - 00000000f0000000 (reserved)
      (XEN)  00000000fe000000 - 0000000100000000 (reserved)
      (XEN)  0000000100000000 - 0000000630000000 (usable)
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      e201bfcc
    • D
      x86/xen: disable premption when enabling local irqs · fb58e300
      David Vrabel 提交于
      If CONFIG_PREEMPT is enabled then xen_enable_irq() (and
      xen_restore_fl()) could be preempted and rescheduled on a different
      VCPU in between the clear of the mask and the check for pending
      events.  This may result in events being lost as the upcall will check
      for pending events on the wrong VCPU.
      
      Fix this by disabling preemption around the unmask and check for
      events.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      fb58e300
    • D
      x86/xen: do not identity map UNUSABLE regions in the machine E820 · 3bc38cbc
      David Vrabel 提交于
      If there are UNUSABLE regions in the machine memory map, dom0 will
      attempt to map them 1:1 which is not permitted by Xen and the kernel
      will crash.
      
      There isn't anything interesting in the UNUSABLE region that the dom0
      kernel needs access to so we can avoid making the 1:1 mapping and
      treat it as RAM.
      
      We only do this for dom0, as that is where tboot case shows up.
      A PV domU could have an UNUSABLE region in its pseudo-physical map
      and would need to be handled in another patch.
      
      This fixes a boot failure on hosts with tboot.
      
      tboot marks a region in the e820 map as unusable and the dom0 kernel
      would attempt to map this region and Xen does not permit unusable
      regions to be mapped by guests.
      
        (XEN)  0000000000000000 - 0000000000060000 (usable)
        (XEN)  0000000000060000 - 0000000000068000 (reserved)
        (XEN)  0000000000068000 - 000000000009e000 (usable)
        (XEN)  0000000000100000 - 0000000000800000 (usable)
        (XEN)  0000000000800000 - 0000000000972000 (unusable)
      
      tboot marked this region as unusable.
      
        (XEN)  0000000000972000 - 00000000cf200000 (usable)
        (XEN)  00000000cf200000 - 00000000cf38f000 (reserved)
        (XEN)  00000000cf38f000 - 00000000cf3ce000 (ACPI data)
        (XEN)  00000000cf3ce000 - 00000000d0000000 (reserved)
        (XEN)  00000000e0000000 - 00000000f0000000 (reserved)
        (XEN)  00000000fe000000 - 0000000100000000 (reserved)
        (XEN)  0000000100000000 - 0000000630000000 (usable)
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      [v1: Altered the patch and description with domU's with UNUSABLE regions]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      3bc38cbc
  8. 09 8月, 2013 9 次提交
  9. 07 8月, 2013 1 次提交
  10. 05 8月, 2013 1 次提交
    • J
      x86: Correctly detect hypervisor · 9df56f19
      Jason Wang 提交于
      We try to handle the hypervisor compatibility mode by detecting hypervisor
      through a specific order. This is not robust, since hypervisors may implement
      each others features.
      
      This patch tries to handle this situation by always choosing the last one in the
      CPUID leaves. This is done by letting .detect() return a priority instead of
      true/false and just re-using the CPUID leaf where the signature were found as
      the priority (or 1 if it was found by DMI). Then we can just pick hypervisor who
      has the highest priority. Other sophisticated detection method could also be
      implemented on top.
      
      Suggested by H. Peter Anvin and Paolo Bonzini.
      Acked-by: NK. Y. Srinivasan <kys@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Doug Covelli <dcovelli@vmware.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dan Hecht <dhecht@vmware.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Link: http://lkml.kernel.org/r/1374742475-2485-4-git-send-email-jasowang@redhat.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      9df56f19
  11. 15 7月, 2013 1 次提交
    • P
      x86: delete __cpuinit usage from all x86 files · 148f9bb8
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      Note that some harmless section mismatch warnings may result, since
      notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
      are flagged as __cpuinit  -- so if we remove the __cpuinit from
      arch specific callers, we will also get section mismatch warnings.
      As an intermediate step, we intend to turn the linux/init.h cpuinit
      content into no-ops as early as possible, since that will get rid
      of these warnings.  In any case, they are temporary and harmless.
      
      This removes all the arch/x86 uses of the __cpuinit macros from
      all C files.  x86 only had the one __CPUINIT used in assembly files,
      and it wasn't paired off with a .previous or a __FINIT, so we can
      delete it directly w/o any corresponding additional change there.
      
      [1] https://lkml.org/lkml/2013/5/20/589
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      148f9bb8
  12. 29 6月, 2013 3 次提交
    • D
      x86: xen: Sync the CMOS RTC as well as the Xen wallclock · 47433b8c
      David Vrabel 提交于
      Adjustments to Xen's persistent clock via update_persistent_clock()
      don't actually persist, as the Xen wallclock is a software only clock
      and modifications to it do not modify the underlying CMOS RTC.
      
      The x86_platform.set_wallclock hook is there to keep the hardware RTC
      synchronized. On a guest this is pointless.
      
      On Dom0 we can use the native implementaion which actually updates the
      hardware RTC, but we still need to keep the software emulation of RTC
      for the guests up to date. The subscription to the pvclock_notifier
      allows us to emulate this easily. The notifier is called at every tick
      and when the clock was set.
      
      Right now we only use that notifier when the clock was set, but due to
      the fact that it is called periodically from the timekeeping update
      code, we can utilize it to emulate the NTP driven drift compensation
      of update_persistant_clock() for the Xen wall (software) clock.
      
      Add a 11 minutes periodic update to the pvclock_gtod notifier callback
      to achieve that. The static variable 'next' which maintains that 11
      minutes update cycle is protected by the core code serialization so
      there is no need to add a Xen specific serialization mechanism.
      
      [ tglx: Massaged changelog and added a few comments ]
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: <xen-devel@lists.xen.org>
      Link: http://lkml.kernel.org/r/1372329348-20841-6-git-send-email-david.vrabel@citrix.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      47433b8c
    • D
      x86: xen: Sync the wallclock when the system time is set · 5584880e
      David Vrabel 提交于
      Currently the Xen wallclock is only updated every 11 minutes if NTP is
      synchronized to its clock source (using the sync_cmos_clock() work).
      If a guest is started before NTP is synchronized it may see an
      incorrect wallclock time.
      
      Use the pvclock_gtod notifier chain to receive a notification when the
      system time has changed and update the wallclock to match.
      
      This chain is called on every timer tick and we want to avoid an extra
      (expensive) hypercall on every tick.  Because dom0 has historically
      never provided a very accurate wallclock and guests do not expect one,
      we can do this simply: the wallclock is only updated if the clock was
      set.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: <xen-devel@lists.xen.org>
      Link: http://lkml.kernel.org/r/1372329348-20841-5-git-send-email-david.vrabel@citrix.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      5584880e
    • L
      xen/time: remove blocked time accounting from xen "clockchip" · 0b0c002c
      Laszlo Ersek 提交于
      ... because the "clock_event_device framework" already accounts for idle
      time through the "event_handler" function pointer in
      xen_timer_interrupt().
      
      The patch is intended as the completion of [1]. It should fix the double
      idle times seen in PV guests' /proc/stat [2]. It should be orthogonal to
      stolen time accounting (the removed code seems to be isolated).
      
      The approach may be completely misguided.
      
      [1] https://lkml.org/lkml/2011/10/6/10
      [2] http://lists.xensource.com/archives/html/xen-devel/2010-08/msg01068.html
      
      John took the time to retest this patch on top of v3.10 and reported:
      "idle time is correctly incremented for pv and hvm for the normal
      case, nohz=off and nohz=idle." so lets put this patch in.
      
      CC: stable@vger.kernel.org
      Signed-off-by: NLaszlo Ersek <lersek@redhat.com>
      Signed-off-by: NJohn Haxby <john.haxby@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      0b0c002c
  13. 10 6月, 2013 8 次提交
    • K
      xen/time: Free onlined per-cpu data structure if we want to online it again. · 09e99da7
      Konrad Rzeszutek Wilk 提交于
      If the per-cpu time data structure has been onlined already and
      we are trying to online it again, then free the previous copy
      before blindly over-writting it.
      
      A developer naturally should not call this function multiple times
      but just in case.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      09e99da7
    • K
      xen/time: Check that the per_cpu data structure has data before freeing. · a05e2c37
      Konrad Rzeszutek Wilk 提交于
      We don't check whether the per_cpu data structure has actually
      been freed in the past. This checks it and if it has been freed
      in the past then just continues on without double-freeing.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      a05e2c37
    • K
      xen/time: Don't leak interrupt name when offlining. · c9d76a24
      Konrad Rzeszutek Wilk 提交于
      When the user does:
          echo 0 > /sys/devices/system/cpu/cpu1/online
          echo 1 > /sys/devices/system/cpu/cpu1/online
      
      kmemleak reports:
      kmemleak: 7 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
      
      One of the leaks is from xen/time:
      
      unreferenced object 0xffff88003fa51280 (size 32):
        comm "swapper/0", pid 1, jiffies 4294667339 (age 1027.789s)
        hex dump (first 32 bytes):
          74 69 6d 65 72 31 00 00 00 00 00 00 00 00 00 00  timer1..........
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff81660721>] kmemleak_alloc+0x21/0x50
          [<ffffffff81190aac>] __kmalloc_track_caller+0xec/0x2a0
          [<ffffffff812fe1bb>] kvasprintf+0x5b/0x90
          [<ffffffff812fe228>] kasprintf+0x38/0x40
          [<ffffffff81041ec1>] xen_setup_timer+0x51/0xf0
          [<ffffffff8166339f>] xen_cpu_up+0x5f/0x3e8
          [<ffffffff8166bbf5>] _cpu_up+0xd1/0x14b
          [<ffffffff8166bd48>] cpu_up+0xd9/0xec
          [<ffffffff81ae6e4a>] smp_init+0x4b/0xa3
          [<ffffffff81ac4981>] kernel_init_freeable+0xdb/0x1e6
          [<ffffffff8165ce39>] kernel_init+0x9/0xf0
          [<ffffffff8167edfc>] ret_from_fork+0x7c/0xb0
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      This patch fixes it by stashing away the 'name' in the per-cpu
      data structure and freeing it when offlining the CPU.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      c9d76a24
    • K
      xen/time: Encapsulate the struct clock_event_device in another structure. · 31620a19
      Konrad Rzeszutek Wilk 提交于
      We don't do any code movement. We just encapsulate the struct clock_event_device
      in a new structure which contains said structure and a pointer to
      a char *name. The 'name' will be used in 'xen/time: Don't leak interrupt
      name when offlining'.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      31620a19
    • K
      xen/spinlock: Don't leak interrupt name when offlining. · 354e7b76
      Konrad Rzeszutek Wilk 提交于
      When the user does:
      echo 0 > /sys/devices/system/cpu/cpu1/online
      echo 1 > /sys/devices/system/cpu/cpu1/online
      
      kmemleak reports:
      kmemleak: 7 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
      
      unreferenced object 0xffff88003fa51260 (size 32):
        comm "swapper/0", pid 1, jiffies 4294667339 (age 1027.789s)
        hex dump (first 32 bytes):
          73 70 69 6e 6c 6f 63 6b 31 00 00 00 00 00 00 00  spinlock1.......
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff81660721>] kmemleak_alloc+0x21/0x50
          [<ffffffff81190aac>] __kmalloc_track_caller+0xec/0x2a0
          [<ffffffff812fe1bb>] kvasprintf+0x5b/0x90
          [<ffffffff812fe228>] kasprintf+0x38/0x40
          [<ffffffff81663789>] xen_init_lock_cpu+0x61/0xbe
          [<ffffffff816633a6>] xen_cpu_up+0x66/0x3e8
          [<ffffffff8166bbf5>] _cpu_up+0xd1/0x14b
          [<ffffffff8166bd48>] cpu_up+0xd9/0xec
          [<ffffffff81ae6e4a>] smp_init+0x4b/0xa3
          [<ffffffff81ac4981>] kernel_init_freeable+0xdb/0x1e6
          [<ffffffff8165ce39>] kernel_init+0x9/0xf0
          [<ffffffff8167edfc>] ret_from_fork+0x7c/0xb0
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      Instead of doing it like the "xen/smp: Don't leak interrupt name when offlining"
      patch did (which has a per-cpu structure which contains both the
      IRQ number and char*) we use a per-cpu pointers to a *char.
      
      The reason is that the "__this_cpu_read(lock_kicker_irq);" macro
      blows up with "__bad_size_call_parameter()" as the size of the
      returned structure is not within the parameters of what it expects
      and optimizes for.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      354e7b76
    • K
      xen/smp: Don't leak interrupt name when offlining. · b85fffec
      Konrad Rzeszutek Wilk 提交于
      When the user does:
      echo 0 > /sys/devices/system/cpu/cpu1/online
      echo 1 > /sys/devices/system/cpu/cpu1/online
      
      kmemleak reports:
      kmemleak: 7 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
      
      unreferenced object 0xffff88003fa51240 (size 32):
        comm "swapper/0", pid 1, jiffies 4294667339 (age 1027.789s)
        hex dump (first 32 bytes):
          72 65 73 63 68 65 64 31 00 00 00 00 00 00 00 00  resched1........
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff81660721>] kmemleak_alloc+0x21/0x50
          [<ffffffff81190aac>] __kmalloc_track_caller+0xec/0x2a0
          [<ffffffff812fe1bb>] kvasprintf+0x5b/0x90
          [<ffffffff812fe228>] kasprintf+0x38/0x40
          [<ffffffff81047ed1>] xen_smp_intr_init+0x41/0x2c0
          [<ffffffff816636d3>] xen_cpu_up+0x393/0x3e8
          [<ffffffff8166bbf5>] _cpu_up+0xd1/0x14b
          [<ffffffff8166bd48>] cpu_up+0xd9/0xec
          [<ffffffff81ae6e4a>] smp_init+0x4b/0xa3
          [<ffffffff81ac4981>] kernel_init_freeable+0xdb/0x1e6
          [<ffffffff8165ce39>] kernel_init+0x9/0xf0
          [<ffffffff8167edfc>] ret_from_fork+0x7c/0xb0
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      This patch fixes some of it by using the 'struct xen_common_irq->name'
      field to stash away the char so that it can be freed when
      the interrupt line is destroyed.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      b85fffec
    • K
      xen/smp: Set the per-cpu IRQ number to a valid default. · ee336e10
      Konrad Rzeszutek Wilk 提交于
      When we free it we want to make sure to set it to a default
      value of -1 so that we don't double-free it (in case somebody
      calls us twice).
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ee336e10
    • K
      xen/smp: Introduce a common structure to contain the IRQ name and interrupt line. · 9547689f
      Konrad Rzeszutek Wilk 提交于
      This patch adds a new structure to contain the common two things
      that each of the per-cpu interrupts need:
       - an interrupt number,
       - and the name of the interrupt (to be added in 'xen/smp: Don't leak
         interrupt name when offlining').
      
      This allows us to carry the tuple of the per-cpu interrupt data structure
      and expand it as we need in the future.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      9547689f