1. 22 1月, 2014 1 次提交
    • R
      xen/pvh: Set X86_CR0_WP and others in CR0 (v2) · c9f6e997
      Roger Pau Monne 提交于
      otherwise we will get for some user-space applications
      that use 'clone' with CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID
      end up hitting an assert in glibc manifested by:
      
      general protection ip:7f80720d364c sp:7fff98fd8a80 error:0 in
      libc-2.13.so[7f807209e000+180000]
      
      This is due to the nature of said operations which sets and clears
      the PID.  "In the successful one I can see that the page table of
      the parent process has been updated successfully to use a
      different physical page, so the write of the tid on
      that page only affects the child...
      
      On the other hand, in the failed case, the write seems to happen before
      the copy of the original page is done, so both the parent and the child
      end up with the same value (because the parent copies the page after
      the write of the child tid has already happened)."
      (Roger's analysis). The nature of this is due to the Xen's commit
      of 51e2cac257ec8b4080d89f0855c498cbbd76a5e5
      "x86/pvh: set only minimal cr0 and cr4 flags in order to use paging"
      the CR0_WP was removed so COW features of the Linux kernel were not
      operating properly.
      
      While doing that also update the rest of the CR0 flags to be inline
      with what a baremetal Linux kernel would set them to.
      
      In 'secondary_startup_64' (baremetal Linux) sets:
      
      X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | X86_CR0_NE | X86_CR0_WP |
      X86_CR0_AM | X86_CR0_PG
      
      The hypervisor for HVM type guests (which PVH is a bit) sets:
      X86_CR0_PE | X86_CR0_ET | X86_CR0_TS
      For PVH it specifically sets:
      X86_CR0_PG
      
      Which means we need to set the rest: X86_CR0_MP | X86_CR0_NE  |
      X86_CR0_WP | X86_CR0_AM to have full parity.
      Signed-off-by: NRoger Pau Monne <roger.pau@citrix.com>
      Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [v1: Took out the cr4 writes to be a seperate patch]
      [v2: 0-DAY kernel found xen_setup_gdt to be missing a static]
      c9f6e997
  2. 06 1月, 2014 1 次提交
    • M
      xen/pvh: Secondary VCPU bringup (non-bootup CPUs) · 5840c84b
      Mukesh Rathor 提交于
      The VCPU bringup protocol follows the PV with certain twists.
      From xen/include/public/arch-x86/xen.h:
      
      Also note that when calling DOMCTL_setvcpucontext and VCPU_initialise
      for HVM and PVH guests, not all information in this structure is updated:
      
       - For HVM guests, the structures read include: fpu_ctxt (if
       VGCT_I387_VALID is set), flags, user_regs, debugreg[*]
      
       - PVH guests are the same as HVM guests, but additionally use ctrlreg[3] to
       set cr3. All other fields not used should be set to 0.
      
      This is what we do. We piggyback on the 'xen_setup_gdt' - but modify
      a bit - we need to call 'load_percpu_segment' so that 'switch_to_new_gdt'
      can load per-cpu data-structures. It has no effect on the VCPU0.
      
      We also piggyback on the %rdi register to pass in the CPU number - so
      that when we bootup a new CPU, the cpu_bringup_and_idle will have
      passed as the first parameter the CPU number (via %rdi for 64-bit).
      Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      5840c84b
  3. 07 11月, 2013 1 次提交
  4. 10 10月, 2013 1 次提交
    • F
      xen: Fix possible user space selector corruption · 7cde9b27
      Frediano Ziglio 提交于
      Due to the way kernel is initialized under Xen is possible that the
      ring1 selector used by the kernel for the boot cpu end up to be copied
      to userspace leading to segmentation fault in the userspace.
      
      Xen code in the kernel initialize no-boot cpus with correct selectors (ds
      and es set to __USER_DS) but the boot one keep the ring1 (passed by Xen).
      On task context switch (switch_to) we assume that ds, es and cs already
      point to __USER_DS and __KERNEL_CSso these selector are not changed.
      
      If processor is an Intel that support sysenter instruction sysenter/sysexit
      is used so ds and es are not restored switching back from kernel to
      userspace. In the case the selectors point to a ring1 instead of __USER_DS
      the userspace code will crash on first memory access attempt (to be
      precise Xen on the emulated iret used to do sysexit will detect and set ds
      and es to zero which lead to GPF anyway).
      
      Now if an userspace process call kernel using sysenter and get rescheduled
      (for me it happen on a specific init calling wait4) could happen that the
      ring1 selector is set to ds and es.
      
      This is quite hard to detect cause after a while these selectors are fixed
      (__USER_DS seems sticky).
      
      Bisecting the code commit 7076aada appears
      to be the first one that have this issue.
      Signed-off-by: NFrediano Ziglio <frediano.ziglio@citrix.com>
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Reviewed-by: NAndrew Cooper <andrew.cooper3@citrix.com>
      7cde9b27
  5. 10 9月, 2013 2 次提交
    • K
      xen/smp: Update pv_lock_ops functions before alternative code starts under PVHVM · 26a79995
      Konrad Rzeszutek Wilk 提交于
      Before this patch we would patch all of the pv_lock_ops sites
      using alternative assembler. Then later in the bootup cycle
      change the unlock_kick and lock_spinning to the Xen specific -
      without re patching.
      
      That meant that for the core of the kernel we would be running
      with the baremetal version of unlock_kick and lock_spinning while
      for modules we would have the proper Xen specific slowpaths.
      
      As most of the module uses some API from the core kernel that ended
      up with slowpath lockers waiting forever to be kicked (b/c they
      would be using the Xen specific slowpath logic). And the
      kick never came b/c the unlock path that was taken was the
      baremetal one.
      
      On PV we do not have the problem as we initialise before the
      alternative code kicks in.
      
      The fix is to make the updating of the pv_lock_ops function
      be done before the alternative code starts patching.
      
      Note that this patch fixes issues discovered by commit
      f10cd522.
      ("xen: disable PV spinlocks on HVM") wherein it mentioned
      
         PV spinlocks cannot possibly work with the current code because they are
         enabled after pvops patching has already been done, and because PV
         spinlocks use a different data structure than native spinlocks so we
         cannot switch between them dynamically.
      
      The first problem is solved by this patch.
      
      The second problem has been solved by commit
      816434ec
      (Merge branch 'x86-spinlocks-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip)
      
      P.S.
      There is still the commit 70dd4998
      (xen/spinlock: Disable IRQ spinlock (PV) allocation on PVHVM) to
      revert but that can be done later after all other bugs have been
      fixed.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      26a79995
    • K
      xen/spinlock: Fix locking path engaging too soon under PVHVM. · 1fb3a8b2
      Konrad Rzeszutek Wilk 提交于
      The xen_lock_spinning has a check for the kicker interrupts
      and if it is not initialized it will spin normally (not enter
      the slowpath).
      
      But for PVHVM case we would initialize the kicker interrupt
      before the CPU came online. This meant that if the booting
      CPU used a spinlock and went in the slowpath - it would
      enter the slowpath and block forever. The forever part because
      during bootup: the spinlock would be taken _before_ the CPU
      sets itself to be online (more on this further), and we enter
      to poll on the event channel forever.
      
      The bootup CPU (see commit fc78d343
      "xen/smp: initialize IPI vectors before marking CPU online"
      for details) and the CPU that started the bootup consult
      the cpu_online_mask to determine whether the booting CPU should
      get an IPI. The booting CPU has to set itself in this mask via:
      
        set_cpu_online(smp_processor_id(), true);
      
      However, if the spinlock is taken before this (and it is) and
      it polls on an event channel - it will never be woken up as
      the kernel will never send an IPI to an offline CPU.
      
      Note that the PVHVM logic in sending IPIs is using the HVM
      path which has numerous checks using the cpu_online_mask
      and cpu_active_mask. See above mention git commit for details.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      1fb3a8b2
  6. 20 8月, 2013 1 次提交
    • C
      xen/smp: initialize IPI vectors before marking CPU online · fc78d343
      Chuck Anderson 提交于
      An older PVHVM guest (v3.0 based) crashed during vCPU hot-plug with:
      
      	kernel BUG at drivers/xen/events.c:1328!
      
      RCU has detected that a CPU has not entered a quiescent state within the
      grace period.  It needs to send the CPU a reschedule IPI if it is not
      offline.  rcu_implicit_offline_qs() does this check:
      
      	/*
      	 * If the CPU is offline, it is in a quiescent state.  We can
      	 * trust its state not to change because interrupts are disabled.
      	 */
      	if (cpu_is_offline(rdp->cpu)) {
      		rdp->offline_fqs++;
      		return 1;
      	}
      
      	Else the CPU is online.  Send it a reschedule IPI.
      
      The CPU is in the middle of being hot-plugged and has been marked online
      (!cpu_is_offline()).  See start_secondary():
      
      	set_cpu_online(smp_processor_id(), true);
      	...
      	per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
      
      start_secondary() then waits for the CPU bringing up the hot-plugged CPU to
      mark it as active:
      
      	/*
      	 * Wait until the cpu which brought this one up marked it
      	 * online before enabling interrupts. If we don't do that then
      	 * we can end up waking up the softirq thread before this cpu
      	 * reached the active state, which makes the scheduler unhappy
      	 * and schedule the softirq thread on the wrong cpu. This is
      	 * only observable with forced threaded interrupts, but in
      	 * theory it could also happen w/o them. It's just way harder
      	 * to achieve.
      	 */
      	while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask))
      		cpu_relax();
      
      	/* enable local interrupts */
      	local_irq_enable();
      
      The CPU being hot-plugged will be marked active after it has been fully
      initialized by the CPU managing the hot-plug.  In the Xen PVHVM case
      xen_smp_intr_init() is called to set up the hot-plugged vCPU's
      XEN_RESCHEDULE_VECTOR.
      
      The hot-plugging CPU is marked online, not marked active and does not have
      its IPI vectors set up.  rcu_implicit_offline_qs() sees the hot-plugging
      cpu is !cpu_is_offline() and tries to send it a reschedule IPI:
      This will lead to:
      
      	kernel BUG at drivers/xen/events.c:1328!
      
      	xen_send_IPI_one()
      	xen_smp_send_reschedule()
      	rcu_implicit_offline_qs()
      	rcu_implicit_dynticks_qs()
      	force_qs_rnp()
      	force_quiescent_state()
      	__rcu_process_callbacks()
      	rcu_process_callbacks()
      	__do_softirq()
      	call_softirq()
      	do_softirq()
      	irq_exit()
      	xen_evtchn_do_upcall()
      
      because xen_send_IPI_one() will attempt to use an uninitialized IRQ for
      the XEN_RESCHEDULE_VECTOR.
      
      There is at least one other place that has caused the same crash:
      
      	xen_smp_send_reschedule()
      	wake_up_idle_cpu()
      	add_timer_on()
      	clocksource_watchdog()
      	call_timer_fn()
      	run_timer_softirq()
      	__do_softirq()
      	call_softirq()
      	do_softirq()
      	irq_exit()
      	xen_evtchn_do_upcall()
      	xen_hvm_callback_vector()
      
      clocksource_watchdog() uses cpu_online_mask to pick the next CPU to handle
      a watchdog timer:
      
      	/*
      	 * Cycle through CPUs to check if the CPUs stay synchronized
      	 * to each other.
      	 */
      	next_cpu = cpumask_next(raw_smp_processor_id(), cpu_online_mask);
      	if (next_cpu >= nr_cpu_ids)
      		next_cpu = cpumask_first(cpu_online_mask);
      	watchdog_timer.expires += WATCHDOG_INTERVAL;
      	add_timer_on(&watchdog_timer, next_cpu);
      
      This resulted in an attempt to send an IPI to a hot-plugging CPU that
      had not initialized its reschedule vector. One option would be to make
      the RCU code check to not check for CPU offline but for CPU active.
      As becoming active is done after a CPU is online (in older kernels).
      
      But Srivatsa pointed out that "the cpu_active vs cpu_online ordering has been
      completely reworked - in the online path, cpu_active is set *before* cpu_online,
      and also, in the cpu offline path, the cpu_active bit is reset in the CPU_DYING
      notification instead of CPU_DOWN_PREPARE." Drilling in this the bring-up
      path: "[brought up CPU].. send out a CPU_STARTING notification, and in response
      to that, the scheduler sets the CPU in the cpu_active_mask. Again, this mask
      is better left to the scheduler alone, since it has the intelligence to use it
      judiciously."
      
      The conclusion was that:
      "
      1. At the IPI sender side:
      
         It is incorrect to send an IPI to an offline CPU (cpu not present in
         the cpu_online_mask). There are numerous places where we check this
         and warn/complain.
      
      2. At the IPI receiver side:
      
         It is incorrect to let the world know of our presence (by setting
         ourselves in global bitmasks) until our initialization steps are complete
         to such an extent that we can handle the consequences (such as
         receiving interrupts without crashing the sender etc.)
      " (from Srivatsa)
      
      As the native code enables the interrupts at some point we need to be
      able to service them. In other words a CPU must have valid IPI vectors
      if it has been marked online.
      
      It doesn't need to handle the IPI (interrupts may be disabled) but needs
      to have valid IPI vectors because another CPU may find it in cpu_online_mask
      and attempt to send it an IPI.
      
      This patch will change the order of the Xen vCPU bring-up functions so that
      Xen vectors have been set up before start_secondary() is called.
      It also will not continue to bring up a Xen vCPU if xen_smp_intr_init() fails
      to initialize it.
      
      Orabug 13823853
      Signed-off-by Chuck Anderson <chuck.anderson@oracle.com>
      Acked-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      fc78d343
  7. 09 8月, 2013 2 次提交
  8. 15 7月, 2013 1 次提交
    • P
      x86: delete __cpuinit usage from all x86 files · 148f9bb8
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      Note that some harmless section mismatch warnings may result, since
      notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
      are flagged as __cpuinit  -- so if we remove the __cpuinit from
      arch specific callers, we will also get section mismatch warnings.
      As an intermediate step, we intend to turn the linux/init.h cpuinit
      content into no-ops as early as possible, since that will get rid
      of these warnings.  In any case, they are temporary and harmless.
      
      This removes all the arch/x86 uses of the __cpuinit macros from
      all C files.  x86 only had the one __CPUINIT used in assembly files,
      and it wasn't paired off with a .previous or a __FINIT, so we can
      delete it directly w/o any corresponding additional change there.
      
      [1] https://lkml.org/lkml/2013/5/20/589
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      148f9bb8
  9. 10 6月, 2013 4 次提交
    • K
      xen/smp: Don't leak interrupt name when offlining. · b85fffec
      Konrad Rzeszutek Wilk 提交于
      When the user does:
      echo 0 > /sys/devices/system/cpu/cpu1/online
      echo 1 > /sys/devices/system/cpu/cpu1/online
      
      kmemleak reports:
      kmemleak: 7 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
      
      unreferenced object 0xffff88003fa51240 (size 32):
        comm "swapper/0", pid 1, jiffies 4294667339 (age 1027.789s)
        hex dump (first 32 bytes):
          72 65 73 63 68 65 64 31 00 00 00 00 00 00 00 00  resched1........
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff81660721>] kmemleak_alloc+0x21/0x50
          [<ffffffff81190aac>] __kmalloc_track_caller+0xec/0x2a0
          [<ffffffff812fe1bb>] kvasprintf+0x5b/0x90
          [<ffffffff812fe228>] kasprintf+0x38/0x40
          [<ffffffff81047ed1>] xen_smp_intr_init+0x41/0x2c0
          [<ffffffff816636d3>] xen_cpu_up+0x393/0x3e8
          [<ffffffff8166bbf5>] _cpu_up+0xd1/0x14b
          [<ffffffff8166bd48>] cpu_up+0xd9/0xec
          [<ffffffff81ae6e4a>] smp_init+0x4b/0xa3
          [<ffffffff81ac4981>] kernel_init_freeable+0xdb/0x1e6
          [<ffffffff8165ce39>] kernel_init+0x9/0xf0
          [<ffffffff8167edfc>] ret_from_fork+0x7c/0xb0
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      This patch fixes some of it by using the 'struct xen_common_irq->name'
      field to stash away the char so that it can be freed when
      the interrupt line is destroyed.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      b85fffec
    • K
      xen/smp: Set the per-cpu IRQ number to a valid default. · ee336e10
      Konrad Rzeszutek Wilk 提交于
      When we free it we want to make sure to set it to a default
      value of -1 so that we don't double-free it (in case somebody
      calls us twice).
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ee336e10
    • K
      xen/smp: Introduce a common structure to contain the IRQ name and interrupt line. · 9547689f
      Konrad Rzeszutek Wilk 提交于
      This patch adds a new structure to contain the common two things
      that each of the per-cpu interrupts need:
       - an interrupt number,
       - and the name of the interrupt (to be added in 'xen/smp: Don't leak
         interrupt name when offlining').
      
      This allows us to carry the tuple of the per-cpu interrupt data structure
      and expand it as we need in the future.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      9547689f
    • K
      xen/smp: Coalesce the free_irq calls in one function. · 53b94fdc
      Konrad Rzeszutek Wilk 提交于
      There are two functions that do a bunch of 'free_irq' on
      the per_cpu IRQ. Instead of having duplicate code just move
      it to one function.
      
      This is just code movement.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      53b94fdc
  10. 04 6月, 2013 1 次提交
    • K
      xen/smp: Fixup NOHZ per cpu data when onlining an offline CPU. · 466318a8
      Konrad Rzeszutek Wilk 提交于
      The xen_play_dead is an undead function. When the vCPU is told to
      offline it ends up calling xen_play_dead wherin it calls the
      VCPUOP_down hypercall which offlines the vCPU. However, when the
      vCPU is onlined back, it resumes execution right after
      VCPUOP_down hypercall.
      
      That was OK (albeit the API for play_dead assumes that the CPU
      stays dead and never returns) but with commit 4b0c0f29
      (tick: Cleanup NOHZ per cpu data on cpu down) that is no longer safe
      as said commit resets the ts->inidle which at the start of the
      cpu_idle loop was set.
      
      The net effect is that we get this warn:
      
      Broke affinity for irq 16
      installing Xen timer for CPU 1
      cpu 1 spinlock event irq 48
      ------------[ cut here ]------------
      WARNING: at /home/konrad/linux-linus/kernel/time/tick-sched.c:935 tick_nohz_idle_exit+0x195/0x1b0()
      Modules linked in: dm_multipath dm_mod xen_evtchn iscsi_boot_sysfs
      CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.10.0-rc3upstream-00068-gdcdbe33a #1
      Hardware name: BIOSTAR Group N61PB-M2S/N61PB-M2S, BIOS 6.00 PG 09/03/2009
       ffffffff8193b448 ffff880039da5e60 ffffffff816707c8 ffff880039da5ea0
       ffffffff8108ce8b ffff880039da4010 ffff88003fa8e500 ffff880039da4010
       0000000000000001 ffff880039da4000 ffff880039da4010 ffff880039da5eb0
      Call Trace:
       [<ffffffff816707c8>] dump_stack+0x19/0x1b
       [<ffffffff8108ce8b>] warn_slowpath_common+0x6b/0xa0
       [<ffffffff8108ced5>] warn_slowpath_null+0x15/0x20
       [<ffffffff810e4745>] tick_nohz_idle_exit+0x195/0x1b0
       [<ffffffff810da755>] cpu_startup_entry+0x205/0x250
       [<ffffffff81661070>] cpu_bringup_and_idle+0x13/0x15
      ---[ end trace 915c8c486004dda1 ]---
      
      b/c ts_inidle is set to zero. Thomas suggested that we just add a workaround
      to call tick_nohz_idle_enter before returning from xen_play_dead() - and
      that is what this patch does and fixes the issue.
      
      We also add the stable part b/c git commit 4b0c0f29 is on the stable
      tree.
      
      CC: stable@vger.kernel.org
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      466318a8
  11. 29 5月, 2013 1 次提交
    • S
      xen: Clean up apic ipi interface · 1db01b49
      Stefan Bader 提交于
      Commit f447d56d introduced the
      implementation of the PV apic ipi interface. But there were some
      odd things (it seems none of which cause really any issue but
      maybe they should be cleaned up anyway):
       - xen_send_IPI_mask_allbutself (and by that xen_send_IPI_allbutself)
         ignore the passed in vector and only use the CALL_FUNCTION_SINGLE
         vector. While xen_send_IPI_all and xen_send_IPI_mask use the vector.
       - physflat_send_IPI_allbutself is declared unnecessarily. It is never
         used.
      
      This patch tries to clean up those things.
      Signed-off-by: NStefan Bader <stefan.bader@canonical.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      1db01b49
  12. 17 4月, 2013 4 次提交
    • K
      xen/smp: Unifiy some of the PVs and PVHVM offline CPU path · b12abaa1
      Konrad Rzeszutek Wilk 提交于
      The "xen_cpu_die" and "xen_hvm_cpu_die" are very similar.
      Lets coalesce them.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      b12abaa1
    • K
      xen/smp/pvhvm: Don't initialize IRQ_WORKER as we are using the native one. · 27d8b207
      Konrad Rzeszutek Wilk 提交于
      There is no need to use the PV version of the IRQ_WORKER mechanism
      as under PVHVM we are using the native version. The native
      version is using the SMP API.
      
      They just sit around unused:
      
        69:          0          0  xen-percpu-ipi       irqwork0
        83:          0          0  xen-percpu-ipi       irqwork1
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      27d8b207
    • K
      xen/smp/spinlock: Fix leakage of the spinlock interrupt line for every CPU online/offline · 66ff0fe9
      Konrad Rzeszutek Wilk 提交于
      While we don't use the spinlock interrupt line (see for details
      commit f10cd522 -
      xen: disable PV spinlocks on HVM) - we should still do the proper
      init / deinit sequence. We did not do that correctly and for the
      CPU init for PVHVM guest we would allocate an interrupt line - but
      failed to deallocate the old interrupt line.
      
      This resulted in leakage of an irq_desc but more importantly this splat
      as we online an offlined CPU:
      
      genirq: Flags mismatch irq 71. 0002cc20 (spinlock1) vs. 0002cc20 (spinlock1)
      Pid: 2542, comm: init.late Not tainted 3.9.0-rc6upstream #1
      Call Trace:
       [<ffffffff811156de>] __setup_irq+0x23e/0x4a0
       [<ffffffff81194191>] ? kmem_cache_alloc_trace+0x221/0x250
       [<ffffffff811161bb>] request_threaded_irq+0xfb/0x160
       [<ffffffff8104c6f0>] ? xen_spin_trylock+0x20/0x20
       [<ffffffff813a8423>] bind_ipi_to_irqhandler+0xa3/0x160
       [<ffffffff81303758>] ? kasprintf+0x38/0x40
       [<ffffffff8104c6f0>] ? xen_spin_trylock+0x20/0x20
       [<ffffffff810cad35>] ? update_max_interval+0x15/0x40
       [<ffffffff816605db>] xen_init_lock_cpu+0x3c/0x78
       [<ffffffff81660029>] xen_hvm_cpu_notify+0x29/0x33
       [<ffffffff81676bdd>] notifier_call_chain+0x4d/0x70
       [<ffffffff810bb2a9>] __raw_notifier_call_chain+0x9/0x10
       [<ffffffff8109402b>] __cpu_notify+0x1b/0x30
       [<ffffffff8166834a>] _cpu_up+0xa0/0x14b
       [<ffffffff816684ce>] cpu_up+0xd9/0xec
       [<ffffffff8165f754>] store_online+0x94/0xd0
       [<ffffffff8141d15b>] dev_attr_store+0x1b/0x20
       [<ffffffff81218f44>] sysfs_write_file+0xf4/0x170
       [<ffffffff811a2864>] vfs_write+0xb4/0x130
       [<ffffffff811a302a>] sys_write+0x5a/0xa0
       [<ffffffff8167ada9>] system_call_fastpath+0x16/0x1b
      cpu 1 spinlock event irq -16
      smpboot: Booting Node 0 Processor 1 APIC 0x2
      
      And if one looks at the /proc/interrupts right after
      offlining (CPU1):
      
        70:          0          0  xen-percpu-ipi       spinlock0
        71:          0          0  xen-percpu-ipi       spinlock1
        77:          0          0  xen-percpu-ipi       spinlock2
      
      There is the oddity of the 'spinlock1' still being present.
      
      CC: stable@vger.kernel.org
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      66ff0fe9
    • K
      xen/smp: Fix leakage of timer interrupt line for every CPU online/offline. · 888b65b4
      Konrad Rzeszutek Wilk 提交于
      In the PVHVM path when we do CPU online/offline path we would
      leak the timer%d IRQ line everytime we do a offline event. The
      online path (xen_hvm_setup_cpu_clockevents via
      x86_cpuinit.setup_percpu_clockev) would allocate a new interrupt
      line for the timer%d.
      
      But we would still use the old interrupt line leading to:
      
      kernel BUG at /home/konrad/ssd/konrad/linux/kernel/hrtimer.c:1261!
      invalid opcode: 0000 [#1] SMP
      RIP: 0010:[<ffffffff810b9e21>]  [<ffffffff810b9e21>] hrtimer_interrupt+0x261/0x270
      .. snip..
       <IRQ>
       [<ffffffff810445ef>] xen_timer_interrupt+0x2f/0x1b0
       [<ffffffff81104825>] ? stop_machine_cpu_stop+0xb5/0xf0
       [<ffffffff8111434c>] handle_irq_event_percpu+0x7c/0x240
       [<ffffffff811175b9>] handle_percpu_irq+0x49/0x70
       [<ffffffff813a74a3>] __xen_evtchn_do_upcall+0x1c3/0x2f0
       [<ffffffff813a760a>] xen_evtchn_do_upcall+0x2a/0x40
       [<ffffffff8167c26d>] xen_hvm_callback_vector+0x6d/0x80
       <EOI>
       [<ffffffff81666d01>] ? start_secondary+0x193/0x1a8
       [<ffffffff81666cfd>] ? start_secondary+0x18f/0x1a8
      
      There is also the oddity (timer1) in the /proc/interrupts after
      offlining CPU1:
      
        64:       1121          0  xen-percpu-virq      timer0
        78:          0          0  xen-percpu-virq      timer1
        84:          0       2483  xen-percpu-virq      timer2
      
      This patch fixes it.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      CC: stable@vger.kernel.org
      888b65b4
  13. 08 4月, 2013 1 次提交
  14. 20 2月, 2013 1 次提交
  15. 16 1月, 2013 1 次提交
  16. 18 12月, 2012 1 次提交
  17. 23 8月, 2012 1 次提交
    • R
      x86/smp: Don't ever patch back to UP if we unplug cpus · 816afe4f
      Rusty Russell 提交于
      We still patch SMP instructions to UP variants if we boot with a
      single CPU, but not at any other time.  In particular, not if we
      unplug CPUs to return to a single cpu.
      
      Paul McKenney points out:
      
       mean offline overhead is 6251/48=130.2 milliseconds.
      
       If I remove the alternatives_smp_switch() from the offline
       path [...] the mean offline overhead is 550/42=13.1 milliseconds
      
      Basically, we're never going to get those 120ms back, and the
      code is pretty messy.
      
      We get rid of:
      
       1) The "smp-alt-once" boot option. It's actually "smp-alt-boot", the
          documentation is wrong. It's now the default.
      
       2) The skip_smp_alternatives flag used by suspend.
      
       3) arch_disable_nonboot_cpus_begin() and arch_disable_nonboot_cpus_end()
          which were only used to set this one flag.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Paul McKenney <paul.mckenney@us.ibm.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/87vcgwwive.fsf@rustcorp.com.auSigned-off-by: NIngo Molnar <mingo@kernel.org>
      816afe4f
  18. 05 6月, 2012 1 次提交
  19. 21 5月, 2012 1 次提交
    • K
      xen/smp: unbind irqworkX when unplugging vCPUs. · 2f1bd67d
      Konrad Rzeszutek Wilk 提交于
      The git commit  1ff2b0c3
      "xen: implement IRQ_WORK_VECTOR handler" added the functionality
      to have a per-cpu "irqworkX" for the IPI APIC functionality.
      However it missed the unbind when a vCPU is unplugged resulting
      in an orphaned per-cpu interrupt line for unplugged vCPU:
      
        30:        216          0   xen-dyn-event     hvc_console
        31:        810          4   xen-dyn-event     eth0
        32:         29          0   xen-dyn-event     blkif
      - 36:          0          0  xen-percpu-ipi       irqwork2
      - 37:        287          0   xen-dyn-event     xenbus
      + 36:        287          0   xen-dyn-event     xenbus
       NMI:          0          0   Non-maskable interrupts
       LOC:          0          0   Local timer interrupts
       SPU:          0          0   Spurious interrupts
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      2f1bd67d
  20. 08 5月, 2012 2 次提交
  21. 27 4月, 2012 1 次提交
    • K
      xen/smp: Fix crash when booting with ACPI hotplug CPUs. · cf405ae6
      Konrad Rzeszutek Wilk 提交于
      When we boot on a machine that can hotplug CPUs and we
      are using 'dom0_max_vcpus=X' on the Xen hypervisor line
      to clip the amount of CPUs available to the initial domain,
      we get this:
      
      (XEN) Command line: com1=115200,8n1 dom0_mem=8G noreboot dom0_max_vcpus=8 sync_console mce_verbosity=verbose console=com1,vga loglvl=all guest_loglvl=all
      .. snip..
      DMI: Intel Corporation S2600CP/S2600CP, BIOS SE5C600.86B.99.99.x032.072520111118 07/25/2011
      .. snip.
      SMP: Allowing 64 CPUs, 32 hotplug CPUs
      installing Xen timer for CPU 7
      cpu 7 spinlock event irq 361
      NMI watchdog: disabled (cpu7): hardware events not enabled
      Brought up 8 CPUs
      .. snip..
      	[acpi processor finds the CPUs are not initialized and starts calling
      	arch_register_cpu, which creates /sys/devices/system/cpu/cpu8/online]
      CPU 8 got hotplugged
      CPU 9 got hotplugged
      CPU 10 got hotplugged
      .. snip..
      initcall 1_acpi_battery_init_async+0x0/0x1b returned 0 after 406 usecs
      calling  erst_init+0x0/0x2bb @ 1
      
      	[and the scheduler sticks newly started tasks on the new CPUs, but
      	said CPUs cannot be initialized b/c the hypervisor has limited the
      	amount of vCPUS to 8 - as per the dom0_max_vcpus=8 flag.
      	The spinlock tries to kick the other CPU, but the structure for that
      	is not initialized and we crash.]
      BUG: unable to handle kernel paging request at fffffffffffffed8
      IP: [<ffffffff81035289>] xen_spin_lock+0x29/0x60
      PGD 180d067 PUD 180e067 PMD 0
      Oops: 0002 [#1] SMP
      CPU 7
      Modules linked in:
      
      Pid: 1, comm: swapper/0 Not tainted 3.4.0-rc2upstream-00001-gf5154e8 #1 Intel Corporation S2600CP/S2600CP
      RIP: e030:[<ffffffff81035289>]  [<ffffffff81035289>] xen_spin_lock+0x29/0x60
      RSP: e02b:ffff8801fb9b3a70  EFLAGS: 00010282
      
      With this patch, we cap the amount of vCPUS that the initial domain
      can run, to exactly what dom0_max_vcpus=X has specified.
      
      In the future, if there is a hypercall that will allow a running
      domain to expand past its initial set of vCPUS, this patch should
      be re-evaluated.
      
      CC: stable@kernel.org
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      cf405ae6
  22. 26 4月, 2012 2 次提交
  23. 07 4月, 2012 1 次提交
  24. 22 3月, 2012 1 次提交
    • K
      xen/smp: Fix bringup bug in AP code. · 106b4438
      Konrad Rzeszutek Wilk 提交于
      The CPU hotplug code has now a callback to help bring up the CPU.
      Without the call we end up getting:
      
       BUG: soft lockup - CPU#0 stuck for 29s! [migration/0:6]
      Modules linked in:
      CPU ] Pid: 6, comm: migration/0 Not tainted 3.3.0upstream-01180-ged378a52 #1 Dell Inc. PowerEdge T105 /0RR825
      RIP: e030:[<ffffffff810d3b8b>]  [<ffffffff810d3b8b>] stop_machine_cpu_stop+0x7b/0xf0
      RSP: e02b:ffff8800ceaabdb0  EFLAGS: 00000293
      .. snip..
      Call Trace:
       [<ffffffff810d3b10>] ? stop_one_cpu_nowait+0x50/0x50
       [<ffffffff810d3841>] cpu_stopper_thread+0xf1/0x1c0
       [<ffffffff815a9776>] ? __schedule+0x3c6/0x760
       [<ffffffff815aa749>] ? _raw_spin_unlock_irqrestore+0x19/0x30
       [<ffffffff810d3750>] ? res_counter_charge+0x150/0x150
       [<ffffffff8108dc76>] kthread+0x96/0xa0
       [<ffffffff815b27e4>] kernel_thread_helper+0x4/0x10
       [<ffffffff815aacbc>] ? retint_restore_ar
      
      Thix fixes it.
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      106b4438
  25. 04 2月, 2012 1 次提交
    • K
      xen/smp: Fix CPU online/offline bug triggering a BUG: scheduling while atomic. · 41bd956d
      Konrad Rzeszutek Wilk 提交于
      When a user offlines a VCPU and then onlines it, we get:
      
      NMI watchdog disabled (cpu2): hardware events not enabled
      BUG: scheduling while atomic: swapper/2/0/0x00000002
      Modules linked in: dm_multipath dm_mod xen_evtchn iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi scsi_mod libcrc32c crc32c radeon fbco
       ttm bitblit softcursor drm_kms_helper xen_blkfront xen_netfront xen_fbfront fb_sys_fops sysimgblt sysfillrect syscopyarea xen_kbdfront xenfs [last unloaded:
      
      Pid: 0, comm: swapper/2 Tainted: G           O 3.2.0phase15.1-00003-gd6f7f5b-dirty #4
      Call Trace:
       [<ffffffff81070571>] __schedule_bug+0x61/0x70
       [<ffffffff8158eb78>] __schedule+0x798/0x850
       [<ffffffff8158ed6a>] schedule+0x3a/0x50
       [<ffffffff810349be>] cpu_idle+0xbe/0xe0
       [<ffffffff81583599>] cpu_bringup_and_idle+0xe/0x10
      
      The reason for this should be obvious from this call-chain:
      cpu_bringup_and_idle:
       \- cpu_bringup
        |   \-[preempt_disable]
        |
        |- cpu_idle
             \- play_dead [assuming the user offlined the VCPU]
             |     \
             |     +- (xen_play_dead)
             |          \- HYPERVISOR_VCPU_off [so VCPU is dead, once user
             |          |                       onlines it starts from here]
             |          \- cpu_bringup [preempt_disable]
             |
             +- preempt_enable_no_reschedule()
             +- schedule()
             \- preempt_enable()
      
      So we have two preempt_disble() and one preempt_enable(). Calling
      preempt_enable() after the cpu_bringup() in the xen_play_dead
      fixes the imbalance.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      41bd956d
  26. 25 1月, 2012 1 次提交
  27. 09 9月, 2011 1 次提交
  28. 02 9月, 2011 1 次提交
  29. 22 8月, 2011 1 次提交
  30. 16 6月, 2011 1 次提交
    • A
      xen: support CONFIG_MAXSMP · 900cba88
      Andrew Jones 提交于
      The MAXSMP config option requires CPUMASK_OFFSTACK, which in turn
      requires we init the memory for the maps while we bring up the cpus.
      MAXSMP also increases NR_CPUS to 4096. This increase in size exposed an
      issue in the argument construction for multicalls from
      xen_flush_tlb_others. The args should only need space for the actual
      number of cpus.
      
      Also in 2.6.39 it exposes a bootup problem.
      
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff8157a1d3>] set_cpu_sibling_map+0x123/0x30d
      ...
      Call Trace:
      [<ffffffff81039a3f>] ? xen_restore_fl_direct_reloc+0x4/0x4
      [<ffffffff819dc4db>] xen_smp_prepare_cpus+0x36/0x135
      ..
      
      CC: stable@kernel.org
      Signed-off-by: NAndrew Jones <drjones@redhat.com>
      [v2: Updated to compile on 3.0]
      [v3: Updated to compile when CONFIG_SMP is not defined]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      900cba88