1. 17 10月, 2013 14 次提交
    • P
      KVM: PPC: Book3S PR: Fix compilation without CONFIG_ALTIVEC · f2481771
      Paul Mackerras 提交于
      Commit 9d1ffdd8 ("KVM: PPC: Book3S PR: Don't corrupt guest state
      when kernel uses VMX") added a call to kvmppc_load_up_altivec() that
      isn't guarded by CONFIG_ALTIVEC, causing a link failure when building
      a kernel without CONFIG_ALTIVEC set.  This adds an #ifdef to fix this.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      f2481771
    • P
      KVM: PPC: Book3S HV: Don't crash host on unknown guest interrupt · f3271d4c
      Paul Mackerras 提交于
      If we come out of a guest with an interrupt that we don't know about,
      instead of crashing the host with a BUG(), we now return to userspace
      with the exit reason set to KVM_EXIT_UNKNOWN and the trap vector in
      the hw.hardware_exit_reason field of the kvm_run structure, as is done
      on x86.  Note that run->exit_reason is already set to KVM_EXIT_UNKNOWN
      at the beginning of kvmppc_handle_exit().
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      f3271d4c
    • P
      KVM: PPC: Book3S HV: Support POWER6 compatibility mode on POWER7 · 388cc6e1
      Paul Mackerras 提交于
      This enables us to use the Processor Compatibility Register (PCR) on
      POWER7 to put the processor into architecture 2.05 compatibility mode
      when running a guest.  In this mode the new instructions and registers
      that were introduced on POWER7 are disabled in user mode.  This
      includes all the VSX facilities plus several other instructions such
      as ldbrx, stdbrx, popcntw, popcntd, etc.
      
      To select this mode, we have a new register accessible through the
      set/get_one_reg interface, called KVM_REG_PPC_ARCH_COMPAT.  Setting
      this to zero gives the full set of capabilities of the processor.
      Setting it to one of the "logical" PVR values defined in PAPR puts
      the vcpu into the compatibility mode for the corresponding
      architecture level.  The supported values are:
      
      0x0f000002	Architecture 2.05 (POWER6)
      0x0f000003	Architecture 2.06 (POWER7)
      0x0f100003	Architecture 2.06+ (POWER7+)
      
      Since the PCR is per-core, the architecture compatibility level and
      the corresponding PCR value are stored in the struct kvmppc_vcore, and
      are therefore shared between all vcpus in a virtual core.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: squash in fix to add missing break statements and documentation]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      388cc6e1
    • P
      KVM: PPC: Book3S HV: Add support for guest Program Priority Register · 4b8473c9
      Paul Mackerras 提交于
      POWER7 and later IBM server processors have a register called the
      Program Priority Register (PPR), which controls the priority of
      each hardware CPU SMT thread, and affects how fast it runs compared
      to other SMT threads.  This priority can be controlled by writing to
      the PPR or by use of a set of instructions of the form or rN,rN,rN
      which are otherwise no-ops but have been defined to set the priority
      to particular levels.
      
      This adds code to context switch the PPR when entering and exiting
      guests and to make the PPR value accessible through the SET/GET_ONE_REG
      interface.  When entering the guest, we set the PPR as late as
      possible, because if we are setting a low thread priority it will
      make the code run slowly from that point on.  Similarly, the
      first-level interrupt handlers save the PPR value in the PACA very
      early on, and set the thread priority to the medium level, so that
      the interrupt handling code runs at a reasonable speed.
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      4b8473c9
    • P
      KVM: PPC: Book3S HV: Store LPCR value for each virtual core · a0144e2a
      Paul Mackerras 提交于
      This adds the ability to have a separate LPCR (Logical Partitioning
      Control Register) value relating to a guest for each virtual core,
      rather than only having a single value for the whole VM.  This
      corresponds to what real POWER hardware does, where there is a LPCR
      per CPU thread but most of the fields are required to have the same
      value on all active threads in a core.
      
      The per-virtual-core LPCR can be read and written using the
      GET/SET_ONE_REG interface.  Userspace can can only modify the
      following fields of the LPCR value:
      
      DPFD	Default prefetch depth
      ILE	Interrupt little-endian
      TC	Translation control (secondary HPT hash group search disable)
      
      We still maintain a per-VM default LPCR value in kvm->arch.lpcr, which
      contains bits relating to memory management, i.e. the Virtualized
      Partition Memory (VPM) bits and the bits relating to guest real mode.
      When this default value is updated, the update needs to be propagated
      to the per-vcore values, so we add a kvmppc_update_lpcr() helper to do
      that.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: fix whitespace]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a0144e2a
    • P
      KVM: PPC: BookE: Add GET/SET_ONE_REG interface for VRSAVE · 8b75cbbe
      Paul Mackerras 提交于
      This makes the VRSAVE register value for a vcpu accessible through
      the GET/SET_ONE_REG interface on Book E systems (in addition to the
      existing GET/SET_SREGS interface), for consistency with Book 3S.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      8b75cbbe
    • P
      KVM: PPC: Book3S HV: Avoid unbalanced increments of VPA yield count · 8c2dbb79
      Paul Mackerras 提交于
      The yield count in the VPA is supposed to be incremented every time
      we enter the guest, and every time we exit the guest, so that its
      value is even when the vcpu is running in the guest and odd when it
      isn't.  However, it's currently possible that we increment the yield
      count on the way into the guest but then find that other CPU threads
      are already exiting the guest, so we go back to nap mode via the
      secondary_too_late label.  In this situation we don't increment the
      yield count again, breaking the relationship between the LSB of the
      count and whether the vcpu is in the guest.
      
      To fix this, we move the increment of the yield count to a point
      after we have checked whether other CPU threads are exiting.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      8c2dbb79
    • P
      KVM: PPC: Book3S HV: Pull out interrupt-reading code into a subroutine · c934243c
      Paul Mackerras 提交于
      This moves the code in book3s_hv_rmhandlers.S that reads any pending
      interrupt from the XICS interrupt controller, and works out whether
      it is an IPI for the guest, an IPI for the host, or a device interrupt,
      into a new function called kvmppc_read_intr.  Later patches will
      need this.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      c934243c
    • P
      KVM: PPC: Book3S HV: Restructure kvmppc_hv_entry to be a subroutine · 218309b7
      Paul Mackerras 提交于
      We have two paths into and out of the low-level guest entry and exit
      code: from a vcpu task via kvmppc_hv_entry_trampoline, and from the
      system reset vector for an offline secondary thread on POWER7 via
      kvm_start_guest.  Currently both just branch to kvmppc_hv_entry to
      enter the guest, and on guest exit, we test the vcpu physical thread
      ID to detect which way we came in and thus whether we should return
      to the vcpu task or go back to nap mode.
      
      In order to make the code flow clearer, and to keep the code relating
      to each flow together, this turns kvmppc_hv_entry into a subroutine
      that follows the normal conventions for call and return.  This means
      that kvmppc_hv_entry_trampoline() and kvmppc_hv_entry() now establish
      normal stack frames, and we use the normal stack slots for saving
      return addresses rather than local_paca->kvm_hstate.vmhandler.  Apart
      from that this is mostly moving code around unchanged.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      218309b7
    • P
      KVM: PPC: Book3S HV: Implement H_CONFER · 42d7604d
      Paul Mackerras 提交于
      The H_CONFER hypercall is used when a guest vcpu is spinning on a lock
      held by another vcpu which has been preempted, and the spinning vcpu
      wishes to give its timeslice to the lock holder.  We implement this
      in the straightforward way using kvm_vcpu_yield_to().
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      42d7604d
    • P
      KVM: PPC: Book3S: Add GET/SET_ONE_REG interface for VRSAVE · c0867fd5
      Paul Mackerras 提交于
      The VRSAVE register value for a vcpu is accessible through the
      GET/SET_SREGS interface for Book E processors, but not for Book 3S
      processors.  In order to make this accessible for Book 3S processors,
      this adds a new register identifier for GET/SET_ONE_REG, and adds
      the code to implement it.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      c0867fd5
    • P
      KVM: PPC: Book3S HV: Implement timebase offset for guests · 93b0f4dc
      Paul Mackerras 提交于
      This allows guests to have a different timebase origin from the host.
      This is needed for migration, where a guest can migrate from one host
      to another and the two hosts might have a different timebase origin.
      However, the timebase seen by the guest must not go backwards, and
      should go forwards only by a small amount corresponding to the time
      taken for the migration.
      
      Therefore this provides a new per-vcpu value accessed via the one_reg
      interface using the new KVM_REG_PPC_TB_OFFSET identifier.  This value
      defaults to 0 and is not modified by KVM.  On entering the guest, this
      value is added onto the timebase, and on exiting the guest, it is
      subtracted from the timebase.
      
      This is only supported for recent POWER hardware which has the TBU40
      (timebase upper 40 bits) register.  Writing to the TBU40 register only
      alters the upper 40 bits of the timebase, leaving the lower 24 bits
      unchanged.  This provides a way to modify the timebase for guest
      migration without disturbing the synchronization of the timebase
      registers across CPU cores.  The kernel rounds up the value given
      to a multiple of 2^24.
      
      Timebase values stored in KVM structures (struct kvm_vcpu, struct
      kvmppc_vcore, etc.) are stored as host timebase values.  The timebase
      values in the dispatch trace log need to be guest timebase values,
      however, since that is read directly by the guest.  This moves the
      setting of vcpu->arch.dec_expires on guest exit to a point after we
      have restored the host timebase so that vcpu->arch.dec_expires is a
      host timebase value.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      93b0f4dc
    • P
      KVM: PPC: Book3S HV: Save/restore SIAR and SDAR along with other PMU registers · 14941789
      Paul Mackerras 提交于
      Currently we are not saving and restoring the SIAR and SDAR registers in
      the PMU (performance monitor unit) on guest entry and exit.  The result
      is that performance monitoring tools in the guest could get false
      information about where a program was executing and what data it was
      accessing at the time of a performance monitor interrupt.  This fixes
      it by saving and restoring these registers along with the other PMU
      registers on guest entry/exit.
      
      This also provides a way for userspace to access these values for a
      vcpu via the one_reg interface.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      14941789
    • M
      KVM: PPC: Book3S HV: Reserve POWER8 space in get/set_one_reg · 3b783474
      Michael Neuling 提交于
      This reserves space in get/set_one_reg ioctl for the extra guest state
      needed for POWER8.  It doesn't implement these at all, it just reserves
      them so that the ABI is defined now.
      
      A few things to note here:
      
      - This add *a lot* state for transactional memory.  TM suspend mode,
        this is unavoidable, you can't simply roll back all transactions and
        store only the checkpointed state.  I've added this all to
        get/set_one_reg (including GPRs) rather than creating a new ioctl
        which returns a struct kvm_regs like KVM_GET_REGS does.  This means we
        if we need to extract the TM state, we are going to need a bucket load
        of IOCTLs.  Hopefully most of the time this will not be needed as we
        can look at the MSR to see if TM is active and only grab them when
        needed.  If this becomes a bottle neck in future we can add another
        ioctl to grab all this state in one go.
      
      - The TM state is offset by 0x80000000.
      
      - For TM, I've done away with VMX and FP and created a single 64x128 bit
        VSX register space.
      
      - I've left a space of 1 (at 0x9c) since Paulus needs to add a value
        which applies to POWER7 as well.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      3b783474
  2. 14 10月, 2013 1 次提交
  3. 13 9月, 2013 2 次提交
  4. 12 9月, 2013 1 次提交
    • N
      mm: migrate: check movability of hugepage in unmap_and_move_huge_page() · 83467efb
      Naoya Horiguchi 提交于
      Currently hugepage migration works well only for pmd-based hugepages
      (mainly due to lack of testing,) so we had better not enable migration of
      other levels of hugepages until we are ready for it.
      
      Some users of hugepage migration (mbind, move_pages, and migrate_pages) do
      page table walk and check pud/pmd_huge() there, so they are safe.  But the
      other users (softoffline and memory hotremove) don't do this, so without
      this patch they can try to migrate unexpected types of hugepages.
      
      To prevent this, we introduce hugepage_migration_support() as an
      architecture dependent check of whether hugepage are implemented on a pmd
      basis or not.  And on some architecture multiple sizes of hugepages are
      available, so hugepage_migration_support() also checks hugepage size.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83467efb
  5. 11 9月, 2013 4 次提交
    • V
      powerpc: Default arch idle could cede processor on pseries · 363edbe2
      Vaidyanathan Srinivasan 提交于
      When adding cpuidle support to pSeries, we introduced two
      regressions:
      
        - The new cpuidle backend driver only works under hypervisors
          supporting the "SLPLAR" option, which isn't the case of the
          old POWER4 hypervisor and the HV "light" used on js2x blades
      
        - The cpuidle driver registers fairly late, meaning that for
          a significant portion of the boot process, we end up having
          all threads spinning. This slows down the boot process and
          increases the overall resource usage if the hypervisor has
          shared processors.
      
      This fixes both by implementing a "default" idle that will cede
      to the hypervisor when possible, in a very simple way without
      all the bells and whisles of cpuidle.
      Reported-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Acked-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: <stable@vger.kernel.org>
      363edbe2
    • V
      powerpc: Fix section mismatch warning for prom_rtas_call · 620e5050
      Vladimir Murzin 提交于
      While cross-building for PPC64 I've got
      
      WARNING: vmlinux.o(.text.unlikely+0x1ba): Section mismatch in
      reference from the function .prom_rtas_call() to the variable
      .init.data:dt_string_start The function .prom_rtas_call() references
      the variable __initdata dt_string_start.  This is often because
      .prom_rtas_call lacks a __initdata annotation or the annotation of
      dt_string_start is wrong.
      
      WARNING: vmlinux.o(.meminit.text+0xeb0): Section mismatch in reference
      from the function .free_area_init_core.isra.47() to the function
      .init.text:.set_pageblock_order() The function __meminit
      .free_area_init_core.isra.47() references a function __init
      .set_pageblock_order().  If .set_pageblock_order is only used by
      .free_area_init_core.isra.47 then annotate .set_pageblock_order with a
      matching annotation.
      
      Fix it by proper annotation of prom_rtas_call.
      Signed-off-by: NVladimir Murzin <murzin.v@gmail.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      620e5050
    • A
      powerpc: Fix possible deadlock on page fault · 69e044dd
      Aneesh Kumar K.V 提交于
       stack_grow_into/14082 is trying to acquire lock:
        (&mm->mmap_sem){++++++}, at: [<c000000000206d28>] .might_fault+0x78/0xe0
      
       but task is already holding lock:
        (&mm->mmap_sem){++++++}, at: [<c0000000007ffd8c>] .do_page_fault+0x24c/0x910
      
       other info that might help us debug this:
        Possible unsafe locking scenario:
      
              CPU0
              ----
         lock(&mm->mmap_sem);
         lock(&mm->mmap_sem);
      
        *** DEADLOCK ***
      
        May be due to missing lock nesting notation
      
       1 lock held by stack_grow_into/14082:
        #0:  (&mm->mmap_sem){++++++}, at: [<c0000000007ffd8c>] .do_page_fault+0x24c/0x910
      
       stack backtrace:
       CPU: 21 PID: 14082 Comm: stack_grow_into Not tainted 3.10.0-10.el7.ppc64.debug #1
       Call Trace:
       [c0000003d396b850] [c000000000016e7c] .show_stack+0x7c/0x1f0 (unreliable)
       [c0000003d396b920] [c000000000813fc8] .dump_stack+0x28/0x3c
       [c0000003d396b990] [c000000000124b90] .__lock_acquire+0x1640/0x1800
       [c0000003d396bab0] [c00000000012570c] .lock_acquire+0xac/0x250
       [c0000003d396bb80] [c000000000206d54] .might_fault+0xa4/0xe0
       [c0000003d396bbf0] [c0000000007ffe2c] .do_page_fault+0x2ec/0x910
       [c0000003d396be30] [c0000000000092e8] handle_page_fault+0x10/0x30
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      69e044dd
    • G
      powerpc: Export cpu_to_chip_id() to fix build error · 256588fd
      Guenter Roeck 提交于
      powerpc allmodconfig build fails with:
      
      ERROR: ".cpu_to_chip_id" [drivers/block/mtip32xx/mtip32xx.ko] undefined!
      
      The problem was introduced with commit 15863ff3 (powerpc: Make chip-id
      information available to userspace).
      
      Export the missing symbol.
      
      Cc: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
      Cc: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      256588fd
  6. 05 9月, 2013 2 次提交
  7. 04 9月, 2013 4 次提交
  8. 29 8月, 2013 2 次提交
  9. 28 8月, 2013 6 次提交
  10. 27 8月, 2013 4 次提交
    • B
      powerpc/powernv: Return secondary CPUs to firmware on kexec · 13906db6
      Benjamin Herrenschmidt 提交于
      With OPAL v3 we can return secondary CPUs to firmware on kexec. This
      allows firmware to do various cleanups making things generally more
      reliable, and will enable the "new" kernel to call OPAL to perform
      some reconfiguration tasks early on that can only be done while
      all the CPUs are in firmware.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      13906db6
    • P
      powerpc: Work around gcc miscompilation of __pa() on 64-bit · bdbc29c1
      Paul Mackerras 提交于
      On 64-bit, __pa(&static_var) gets miscompiled by recent versions of
      gcc as something like:
      
              addis 3,2,.LANCHOR1+4611686018427387904@toc@ha
              addi 3,3,.LANCHOR1+4611686018427387904@toc@l
      
      This ends up effectively ignoring the offset, since its bottom 32 bits
      are zero, and means that the result of __pa() still has 0xC in the top
      nibble.  This happens with gcc 4.8.1, at least.
      
      To work around this, for 64-bit we make __pa() use an AND operator,
      and for symmetry, we make __va() use an OR operator.  Using an AND
      operator rather than a subtraction ends up with slightly shorter code
      since it can be done with a single clrldi instruction, whereas it
      takes three instructions to form the constant (-PAGE_OFFSET) and add
      it on.  (Note that MEMORY_START is always 0 on 64-bit.)
      
      CC: <stable@vger.kernel.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      bdbc29c1
    • B
      powerpc: Don't Oops when accessing /proc/powerpc/lparcfg without hypervisor · f5f6cbb6
      Benjamin Herrenschmidt 提交于
      /proc/powerpc/lparcfg is an ancient facility (though still actively used)
      which allows access to some informations relative to the partition when
      running underneath a PAPR compliant hypervisor.
      
      It makes no sense on non-pseries machines. However, currently, not only
      can it be created on these if the kernel has pseries support, but accessing
      it on such a machine will crash due to trying to do hypervisor calls.
      
      In fact, it should also not do HV calls on older pseries that didn't have
      an hypervisor either.
      
      Finally, it has the plumbing to be a module but is a "bool" Kconfig option.
      
      This fixes the whole lot by turning it into a machine_device_initcall
      that is only created on pseries, and adding the necessary hypervisor
      check before calling the H_GET_EM_PARMS hypercall
      
      CC: <stable@vger.kernel.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      f5f6cbb6
    • B
      powerpc/btext: Fix CONFIG_PPC_EARLY_DEBUG_BOOTX on ppc32 · ee372bc1
      Benjamin Herrenschmidt 提交于
      The "rmci" stuff only exists on 64-bit
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ee372bc1