1. 30 5月, 2012 2 次提交
    • P
      KVM: PPC: Book3S HV: Make the guest hash table size configurable · 32fad281
      Paul Mackerras 提交于
      This adds a new ioctl to enable userspace to control the size of the guest
      hashed page table (HPT) and to clear it out when resetting the guest.
      The KVM_PPC_ALLOCATE_HTAB ioctl is a VM ioctl and takes as its parameter
      a pointer to a u32 containing the desired order of the HPT (log base 2
      of the size in bytes), which is updated on successful return to the
      actual order of the HPT which was allocated.
      
      There must be no vcpus running at the time of this ioctl.  To enforce
      this, we now keep a count of the number of vcpus running in
      kvm->arch.vcpus_running.
      
      If the ioctl is called when a HPT has already been allocated, we don't
      reallocate the HPT but just clear it out.  We first clear the
      kvm->arch.rma_setup_done flag, which has two effects: (a) since we hold
      the kvm->lock mutex, it will prevent any vcpus from starting to run until
      we're done, and (b) it means that the first vcpu to run after we're done
      will re-establish the VRMA if necessary.
      
      If userspace doesn't call this ioctl before running the first vcpu, the
      kernel will allocate a default-sized HPT at that point.  We do it then
      rather than when creating the VM, as the code did previously, so that
      userspace has a chance to do the ioctl if it wants.
      
      When allocating the HPT, we can allocate either from the kernel page
      allocator, or from the preallocated pool.  If userspace is asking for
      a different size from the preallocated HPTs, we first try to allocate
      using the kernel page allocator.  Then we try to allocate from the
      preallocated pool, and then if that fails, we try allocating decreasing
      sizes from the kernel page allocator, down to the minimum size allowed
      (256kB).  Note that the kernel page allocator limits allocations to
      1 << CONFIG_FORCE_MAX_ZONEORDER pages, which by default corresponds to
      16MB (on 64-bit powerpc, at least).
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: fix module compilation]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      32fad281
    • L
      KVM: PPC: Factor out guest epapr initialization · 2e1ae9c0
      Liu Yu-B13201 提交于
      epapr paravirtualization support is now a Kconfig
      selectable option
      Signed-off-by: NLiu Yu <yu.liu@freescale.com>
      [stuart.yoder@freescale.com: misc minor fixes, description update]
      Signed-off-by: NStuart Yoder <stuart.yoder@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      2e1ae9c0
  2. 28 5月, 2012 1 次提交
  3. 22 5月, 2012 1 次提交
  4. 17 5月, 2012 1 次提交
    • S
      fork: move the real prepare_to_copy() users to arch_dup_task_struct() · 55ccf3fe
      Suresh Siddha 提交于
      Historical prepare_to_copy() is mostly a no-op, duplicated for majority of
      the architectures and the rest following the x86 model of flushing the extended
      register state like fpu there.
      
      Remove it and use the arch_dup_task_struct() instead.
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Link: http://lkml.kernel.org/r/1336692811-30576-1-git-send-email-suresh.b.siddha@intel.comAcked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Chen Liqin <liqin.chen@sunplusct.com>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      55ccf3fe
  5. 16 5月, 2012 1 次提交
  6. 14 5月, 2012 3 次提交
  7. 12 5月, 2012 1 次提交
  8. 09 5月, 2012 3 次提交
    • P
      sched/numa: Rewrite the CONFIG_NUMA sched domain support · cb83b629
      Peter Zijlstra 提交于
      The current code groups up to 16 nodes in a level and then puts an
      ALLNODES domain spanning the entire tree on top of that. This doesn't
      reflect the numa topology and esp for the smaller not-fully-connected
      machines out there today this might make a difference.
      
      Therefore, build a proper numa topology based on node_distance().
      
      Since there's no fixed numa layers anymore, the static SD_NODE_INIT
      and SD_ALLNODES_INIT aren't usable anymore, the new code tries to
      construct something similar and scales some values either on the
      number of cpus in the domain and/or the node_distance() ratio.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: linux-alpha@vger.kernel.org
      Cc: linux-ia64@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: linux-sh@vger.kernel.org
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: sparclinux@vger.kernel.org
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: x86@kernel.org
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      Cc: Greg Pearson <greg.pearson@hp.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: bob.picco@oracle.com
      Cc: chris.mason@oracle.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-r74n3n8hhuc2ynbrnp3vt954@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cb83b629
    • T
      powerpc: Remove now unused _TIF_RUNLATCH · a7243c1d
      Tiejun Chen 提交于
      'TIF_RUNLATCH' is already dropped from
      commit fe1952fc
      
      	powerpc: Rework runlatch code
      
      So '_TIF_RUNLATCH' should be removed as well.
      Signed-off-by: NTiejun Chen <tiejun.chen@windriver.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a7243c1d
    • B
      powerpc/irq: Make alignment & program interrupt behave the same · a3512b2d
      Benjamin Herrenschmidt 提交于
      Alignment was the last user of the ENABLE_INTS macro, which we can
      now remove. All non-syscall exceptions now disable interrupts on
      entry, they get re-enabled conditionally from C code. Don't
      unconditionally re-enable in program check either, check the
      original context.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a3512b2d
  9. 08 5月, 2012 2 次提交
  10. 06 5月, 2012 6 次提交
  11. 30 4月, 2012 8 次提交
  12. 25 4月, 2012 1 次提交
  13. 23 4月, 2012 1 次提交
  14. 20 4月, 2012 2 次提交
  15. 12 4月, 2012 1 次提交
  16. 11 4月, 2012 1 次提交
  17. 08 4月, 2012 5 次提交
    • A
      KVM: PPC: Pass EA to updating emulation ops · 6020c0f6
      Alexander Graf 提交于
      When emulating updating load/store instructions (lwzu, stwu, ...) we need to
      write the effective address of the load/store into a register.
      
      Currently, we write the physical address in there, which is very wrong. So
      instead let's save off where the virtual fault was on MMIO and use that
      information as value to put into the register.
      
      While at it, also move the XOP variants of the above instructions to the new
      scheme of using the already known vaddr instead of calculating it themselves.
      Reported-by: NJörg Sommer <joerg@alea.gnuu.de>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      6020c0f6
    • P
      KVM: PPC: Book3S HV: Report stolen time to guest through dispatch trace log · 0456ec4f
      Paul Mackerras 提交于
      This adds code to measure "stolen" time per virtual core in units of
      timebase ticks, and to report the stolen time to the guest using the
      dispatch trace log (DTL).  The guest can register an area of memory
      for the DTL for a given vcpu.  The DTL is a ring buffer where KVM
      fills in one entry every time it enters the guest for that vcpu.
      
      Stolen time is measured as time when the virtual core is not running,
      either because the vcore is not runnable (e.g. some of its vcpus are
      executing elsewhere in the kernel or in userspace), or when the vcpu
      thread that is running the vcore is preempted.  This includes time
      when all the vcpus are idle (i.e. have executed the H_CEDE hypercall),
      which is OK because the guest accounts stolen time while idle as idle
      time.
      
      Each vcpu keeps a record of how much stolen time has been reported to
      the guest for that vcpu so far.  When we are about to enter the guest,
      we create a new DTL entry (if the guest vcpu has a DTL) and report the
      difference between total stolen time for the vcore and stolen time
      reported so far for the vcpu as the "enqueue to dispatch" time in the
      DTL entry.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      0456ec4f
    • P
      KVM: PPC: Book3S HV: Make virtual processor area registration more robust · 2e25aa5f
      Paul Mackerras 提交于
      The PAPR API allows three sorts of per-virtual-processor areas to be
      registered (VPA, SLB shadow buffer, and dispatch trace log), and
      furthermore, these can be registered and unregistered for another
      virtual CPU.  Currently we just update the vcpu fields pointing to
      these areas at the time of registration or unregistration.  If this
      is done on another vcpu, there is the possibility that the target vcpu
      is using those fields at the time and could end up using a bogus
      pointer and corrupting memory.
      
      This fixes the race by making the target cpu itself do the update, so
      we can be sure that the update happens at a time when the fields
      aren't being used.  Each area now has a struct kvmppc_vpa which is
      used to manage these updates.  There is also a spinlock which protects
      access to all of the kvmppc_vpa structs, other than to the pinned_addr
      fields.  (We could have just taken the spinlock when using the vpa,
      slb_shadow or dtl fields, but that would mean taking the spinlock on
      every guest entry and exit.)
      
      This also changes 'struct dtl' (which was undefined) to 'struct dtl_entry',
      which is what the rest of the kernel uses.
      
      Thanks to Michael Ellerman <michael@ellerman.id.au> for pointing out
      the need to initialize vcpu->arch.vpa_update_lock.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      2e25aa5f
    • P
      KVM: PPC: Book3S HV: Make secondary threads more robust against stray IPIs · f0888f70
      Paul Mackerras 提交于
      Currently on POWER7, if we are running the guest on a core and we don't
      need all the hardware threads, we do nothing to ensure that the unused
      threads aren't executing in the kernel (other than checking that they
      are offline).  We just assume they're napping and we don't do anything
      to stop them trying to enter the kernel while the guest is running.
      This means that a stray IPI can wake up the hardware thread and it will
      then try to enter the kernel, but since the core is in guest context,
      it will execute code from the guest in hypervisor mode once it turns the
      MMU on, which tends to lead to crashes or hangs in the host.
      
      This fixes the problem by adding two new one-byte flags in the
      kvmppc_host_state structure in the PACA which are used to interlock
      between the primary thread and the unused secondary threads when entering
      the guest.  With these flags, the primary thread can ensure that the
      unused secondaries are not already in kernel mode (i.e. handling a stray
      IPI) and then indicate that they should not try to enter the kernel
      if they do get woken for any reason.  Instead they will go into KVM code,
      find that there is no vcpu to run, acknowledge and clear the IPI and go
      back to nap mode.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      f0888f70
    • A
      KVM: PPC: booke: Reinject performance monitor interrupts · 7cc1e8ee
      Alexander Graf 提交于
      When we get a performance monitor interrupt, we need to make sure that
      the host receives it. So reinject it like we reinject the other host
      destined interrupts.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      7cc1e8ee