1. 13 9月, 2012 13 次提交
  2. 11 9月, 2012 10 次提交
  3. 07 9月, 2012 4 次提交
  4. 06 9月, 2012 3 次提交
  5. 05 9月, 2012 10 次提交
    • A
      xen: fix logical error in tlb flushing · ce7184bd
      Alex Shi 提交于
      While TLB_FLUSH_ALL gets passed as 'end' argument to
      flush_tlb_others(), the Xen code was made to check its 'start'
      parameter. That may give a incorrect op.cmd to MMUEXT_INVLPG_MULTI
      instead of MMUEXT_TLB_FLUSH_MULTI. Then it causes some page can not
      be flushed from TLB.
      
      This patch fixed this issue.
      Reported-by: NJan Beulich <jbeulich@suse.com>
      Signed-off-by: NAlex Shi <alex.shi@intel.com>
      Acked-by: NJan Beulich <jbeulich@suse.com>
      Tested-by: NYongjie Ren <yongjie.ren@intel.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ce7184bd
    • K
      xen/p2m: Fix one-off error in checking the P2M tree directory. · 50e90041
      Konrad Rzeszutek Wilk 提交于
      We would traverse the full P2M top directory (from 0->MAX_DOMAIN_PAGES
      inclusive) when trying to figure out whether we can re-use some of the
      P2M middle leafs.
      
      Which meant that if the kernel was compiled with MAX_DOMAIN_PAGES=512
      we would try to use the 512th entry. Fortunately for us the p2m_top_index
      has a check for this:
      
       BUG_ON(pfn >= MAX_P2M_PFN);
      
      which we hit and saw this:
      
      (XEN) domain_crash_sync called from entry.S
      (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
      (XEN) ----[ Xen-4.1.2-OVM  x86_64  debug=n  Tainted:    C ]----
      (XEN) CPU:    0
      (XEN) RIP:    e033:[<ffffffff819cadeb>]
      (XEN) RFLAGS: 0000000000000212   EM: 1   CONTEXT: pv guest
      (XEN) rax: ffffffff81db5000   rbx: ffffffff81db4000   rcx: 0000000000000000
      (XEN) rdx: 0000000000480211   rsi: 0000000000000000   rdi: ffffffff81db4000
      (XEN) rbp: ffffffff81793db8   rsp: ffffffff81793d38   r8:  0000000008000000
      (XEN) r9:  4000000000000000   r10: 0000000000000000   r11: ffffffff81db7000
      (XEN) r12: 0000000000000ff8   r13: ffffffff81df1ff8   r14: ffffffff81db6000
      (XEN) r15: 0000000000000ff8   cr0: 000000008005003b   cr4: 00000000000026f0
      (XEN) cr3: 0000000661795000   cr2: 0000000000000000
      
      Fixes-Oracle-Bug: 14570662
      CC: stable@vger.kernel.org # only for v3.5
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      50e90041
    • B
      powerpc: Don't use __put_user() in patch_instruction · 636802ef
      Benjamin Herrenschmidt 提交于
      patch_instruction() can be called very early on ppc32, when the kernel
      isn't yet running at it's linked address. That can cause the !
      is_kernel_addr() test in __put_user() to trip and call might_sleep()
      which is very bad at that point during boot.
      
      Use a lower level function instead for now, at least until we get to
      rework ppc32 boot process to do the code patching later, like ppc64
      does.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      636802ef
    • P
      powerpc: Make sure IPI handlers see data written by IPI senders · 9fb1b36c
      Paul Mackerras 提交于
      We have been observing hangs, both of KVM guest vcpu tasks and more
      generally, where a process that is woken doesn't properly wake up and
      continue to run, but instead sticks in TASK_WAKING state.  This
      happens because the update of rq->wake_list in ttwu_queue_remote()
      is not ordered with the update of ipi_message in
      smp_muxed_ipi_message_pass(), and the reading of rq->wake_list in
      scheduler_ipi() is not ordered with the reading of ipi_message in
      smp_ipi_demux().  Thus it is possible for the IPI receiver not to see
      the updated rq->wake_list and therefore conclude that there is nothing
      for it to do.
      
      In order to make sure that anything done before smp_send_reschedule()
      is ordered before anything done in the resulting call to scheduler_ipi(),
      this adds barriers in smp_muxed_message_pass() and smp_ipi_demux().
      The barrier in smp_muxed_message_pass() is a full barrier to ensure that
      there is a full ordering between the smp_send_reschedule() caller and
      scheduler_ipi().  In smp_ipi_demux(), we use xchg() rather than
      xchg_local() because xchg() includes release and acquire barriers.
      Using xchg() rather than xchg_local() makes sense given that
      ipi_message is not just accessed locally.
      
      This moves the barrier between setting the message and calling the
      cause_ipi() function into the individual cause_ipi implementations.
      Most of them -- those that used outb, out_8 or similar -- already had
      a full barrier because out_8 etc. include a sync before the MMIO
      store.  This adds an explicit barrier in the two remaining cases.
      
      These changes made no measurable difference to the speed of IPIs as
      measured using a simple ping-pong latency test across two CPUs on
      different cores of a POWER7 machine.
      
      The analysis of the reason why processes were not waking up properly
      is due to Milton Miller.
      
      Cc: stable@vger.kernel.org # v3.0+
      Reported-by: NMilton Miller <miltonm@bga.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      9fb1b36c
    • A
      powerpc: Restore correct DSCR in context switch · 71433285
      Anton Blanchard 提交于
      During a context switch we always restore the per thread DSCR value.
      If we aren't doing explicit DSCR management
      (ie thread.dscr_inherit == 0) and the default DSCR changed while
      the process has been sleeping we end up with the wrong value.
      
      Check thread.dscr_inherit and select the default DSCR or per thread
      DSCR as required.
      
      This was found with the following test case, when running with
      more threads than CPUs (ie forcing context switching):
      
      http://ozlabs.org/~anton/junkcode/dscr_default_test.c
      
      With the four patches applied I can run a combination of all
      test cases successfully at the same time:
      
      http://ozlabs.org/~anton/junkcode/dscr_default_test.c
      http://ozlabs.org/~anton/junkcode/dscr_explicit_test.c
      http://ozlabs.org/~anton/junkcode/dscr_inherit_test.cSigned-off-by: NAnton Blanchard <anton@samba.org>
      Cc: <stable@kernel.org> # 3.0+
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      71433285
    • A
      powerpc: Fix DSCR inheritance in copy_thread() · 1021cb26
      Anton Blanchard 提交于
      If the default DSCR is non zero we set thread.dscr_inherit in
      copy_thread() meaning the new thread and all its children will ignore
      future updates to the default DSCR. This is not intended and is
      a change in behaviour that a number of our users have hit.
      
      We just need to inherit thread.dscr and thread.dscr_inherit from
      the parent which ends up being much simpler.
      
      This was found with the following test case:
      
      http://ozlabs.org/~anton/junkcode/dscr_default_test.cSigned-off-by: NAnton Blanchard <anton@samba.org>
      Cc: <stable@kernel.org> # 3.0+
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1021cb26
    • A
      powerpc: Keep thread.dscr and thread.dscr_inherit in sync · 00ca0de0
      Anton Blanchard 提交于
      When we update the DSCR either via emulation of mtspr(DSCR) or via
      a change to dscr_default in sysfs we don't update thread.dscr.
      We will eventually update it at context switch time but there is
      a period where thread.dscr is incorrect.
      
      If we fork at this point we will copy the old value of thread.dscr
      into the child. To avoid this, always keep thread.dscr in sync with
      reality.
      
      This issue was found with the following testcase:
      
      http://ozlabs.org/~anton/junkcode/dscr_inherit_test.cSigned-off-by: NAnton Blanchard <anton@samba.org>
      Cc: <stable@kernel.org> # 3.0+
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      00ca0de0
    • A
      powerpc: Update DSCR on all CPUs when writing sysfs dscr_default · 1b6ca2a6
      Anton Blanchard 提交于
      Writing to dscr_default in sysfs doesn't actually change the DSCR -
      we rely on a context switch on each CPU to do the work. There is no
      guarantee we will get a context switch in a reasonable amount of time
      so fire off an IPI to force an immediate change.
      
      This issue was found with the following test case:
      
      http://ozlabs.org/~anton/junkcode/dscr_explicit_test.cSigned-off-by: NAnton Blanchard <anton@samba.org>
      Cc: <stable@kernel.org> # 3.0+
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1b6ca2a6
    • P
      powerpc/powernv: Always go into nap mode when CPU is offline · 375f561a
      Paul Mackerras 提交于
      The CPU hotplug code for the powernv platform currently only puts
      offline CPUs into nap mode if the powersave_nap variable is set.
      However, HV-style KVM on this platform requires secondary CPU threads
      to be offline and in nap mode.  Since we know nap mode works just
      fine on all POWER7 machines, and the only machines that support the
      powernv platform are POWER7 machines, this changes the code to
      always put offline CPUs into nap mode, regardless of powersave_nap.
      Powersave_nap still controls whether or not CPUs go into nap mode
      when idle, as before.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      375f561a
    • P
      powerpc: Give hypervisor decrementer interrupts their own handler · dabe859e
      Paul Mackerras 提交于
      At the moment the handler for hypervisor decrementer interrupts is
      the same as for decrementer interrupts, i.e. timer_interrupt().
      This is bogus; if we ever do get a hypervisor decrementer interrupt
      it won't have anything to do with the next timer event.  In fact
      the only time we get hypervisor decrementer interrupts is when one
      is left pending on exit from a KVM guest.
      
      When we get a hypervisor decrementer interrupt we don't need to do
      anything special to clear it, since they are edge-triggered on the
      transition of HDEC from 0 to -1.  Thus this adds an empty handler
      function for them.  We don't need to have them masked when interrupts
      are soft-disabled, so we use STD_EXCEPTION_HV instead of
      MASKABLE_EXCEPTION_HV.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      dabe859e