1. 07 4月, 2014 3 次提交
  2. 29 3月, 2014 3 次提交
    • P
      KVM: PPC: Book3S HV: Save/restore host PMU registers that are new in POWER8 · 72cde5a8
      Paul Mackerras 提交于
      Currently we save the host PMU configuration, counter values, etc.,
      when entering a guest, and restore it on return from the guest.
      (We have to do this because the guest has control of the PMU while
      it is executing.)  However, we missed saving/restoring the SIAR and
      SDAR registers, as well as the registers which are new on POWER8,
      namely SIER and MMCR2.
      
      This adds code to save the values of these registers when entering
      the guest and restore them on exit.  This also works around the bug
      in POWER8 where setting PMAE with a counter already negative doesn't
      generate an interrupt.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NScott Wood <scottwood@freescale.com>
      72cde5a8
    • P
      KVM: PPC: Book3S HV: Don't use kvm_memslots() in real mode · 797f9c07
      Paul Mackerras 提交于
      With HV KVM, some high-frequency hypercalls such as H_ENTER are handled
      in real mode, and need to access the memslots array for the guest.
      Accessing the memslots array is safe, because we hold the SRCU read
      lock for the whole time that a guest vcpu is running.  However, the
      checks that kvm_memslots() does when lockdep is enabled are potentially
      unsafe in real mode, when only the linear mapping is available.
      Furthermore, kvm_memslots() can be called from a secondary CPU thread,
      which is an offline CPU from the point of view of the host kernel,
      and is not running the task which holds the SRCU read lock.
      
      To avoid false positives in the checks in kvm_memslots(), and to avoid
      possible side effects from doing the checks in real mode, this replaces
      kvm_memslots() with kvm_memslots_raw() in all the places that execute
      in real mode.  kvm_memslots_raw() is a new function that is like
      kvm_memslots() but uses rcu_dereference_raw_notrace() instead of
      kvm_dereference_check().
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NScott Wood <scottwood@freescale.com>
      797f9c07
    • M
      KVM: PPC: Book3S HV: Add transactional memory support · e4e38121
      Michael Neuling 提交于
      This adds saving of the transactional memory (TM) checkpointed state
      on guest entry and exit.  We only do this if we see that the guest has
      an active transaction.
      
      It also adds emulation of the TM state changes when delivering IRQs
      into the guest.  According to the architecture, if we are
      transactional when an IRQ occurs, the TM state is changed to
      suspended, otherwise it's left unchanged.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NScott Wood <scottwood@freescale.com>
      e4e38121
  3. 26 3月, 2014 2 次提交
    • L
      KVM: PPC: Book3S: Introduce hypervisor call H_GET_TCE · 69e9fbb2
      Laurent Dufour 提交于
      This introduces the H_GET_TCE hypervisor call, which is basically the
      reverse of H_PUT_TCE, as defined in the Power Architecture Platform
      Requirements (PAPR).
      
      The hcall H_GET_TCE is required by the kdump kernel, which uses it to
      retrieve TCEs set up by the previous (panicked) kernel.
      Signed-off-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      69e9fbb2
    • G
      KVM: PPC: Book3S HV: Fix incorrect userspace exit on ioeventfd write · e59d24e6
      Greg Kurz 提交于
      When the guest does an MMIO write which is handled successfully by an
      ioeventfd, ioeventfd_write() returns 0 (success) and
      kvmppc_handle_store() returns EMULATE_DONE.  Then
      kvmppc_emulate_mmio() converts EMULATE_DONE to RESUME_GUEST_NV and
      this causes an exit from the loop in kvmppc_vcpu_run_hv(), causing an
      exit back to userspace with a bogus exit reason code, typically
      causing userspace (e.g. qemu) to crash with a message about an unknown
      exit code.
      
      This adds handling of RESUME_GUEST_NV in kvmppc_vcpu_run_hv() in order
      to fix that.  For generality, we define a helper to check for either
      of the return-to-guest codes we use, RESUME_GUEST and RESUME_GUEST_NV,
      to make it easy to check for either and provide one place to update if
      any other return-to-guest code gets defined in future.
      
      Since it only affects Book3S HV for now, the helper is added to
      the kvm_book3s.h header file.
      
      We use the helper in two places in kvmppc_run_core() as well for
      future-proofing, though we don't see RESUME_GUEST_NV in either place
      at present.
      
      [paulus@samba.org - combined 4 patches into one, rewrote description]
      Suggested-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NGreg Kurz <gkurz@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      e59d24e6
  4. 24 3月, 2014 10 次提交
  5. 20 3月, 2014 7 次提交
    • H
      random: Add arch_has_random[_seed]() · 7b878d4b
      H. Peter Anvin 提交于
      Add predicate functions for having arch_get_random[_seed]*().  The
      only current use is to avoid the loop in arch_random_refill() when
      arch_get_random_seed_long() is unavailable.
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      7b878d4b
    • H
      x86, random: Enable the RDSEED instruction · d20f78d2
      H. Peter Anvin 提交于
      Upcoming Intel silicon adds a new RDSEED instruction, which is similar
      to RDRAND but provides a stronger guarantee: unlike RDRAND, RDSEED
      will always reseed the PRNG from the true random number source between
      each read.  Thus, the output of RDSEED is guaranteed to be 100%
      entropic, unlike RDRAND which is only architecturally guaranteed to be
      1/512 entropic (although in practice is much more.)
      
      The RDSEED instruction takes the same time to execute as RDRAND, but
      RDSEED unlike RDRAND can legitimately return failure (CF=0) due to
      entropy exhaustion if too many threads on too many cores are hammering
      the RDSEED instruction at the same time.  Therefore, we have to be
      more conservative and only use it in places where we can tolerate
      failures.
      
      This patch introduces the primitives arch_get_random_seed_{int,long}()
      but does not use it yet.
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      d20f78d2
    • S
      powerpc/booke64: Critical and machine check exception support · 609af38f
      Scott Wood 提交于
      Add special state saving for critical and machine check exceptions.
      
      Most of this code could be used to handle debug exceptions taken from
      kernel space, but actually doing so is outside the scope of this patch.
      
      The various critical and machine check exceptions now point to their
      real handlers, rather than hanging the kernel.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      609af38f
    • S
      powerpc/booke64: Use SPRG_TLB_EXFRAME on bolted handlers · a3dc6207
      Scott Wood 提交于
      While bolted handlers (including e6500) do not need to deal with a TLB
      miss recursively causing another TLB miss, nested TLB misses can still
      happen with crit/mc/debug exceptions -- so we still need to honor
      SPRG_TLB_EXFRAME.
      
      We don't need to spend time modifying it in the TLB miss fastpath,
      though -- the special level exception will handle that.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Cc: Mihai Caraman <mihai.caraman@freescale.com>
      Cc: kvm-ppc@vger.kernel.org
      a3dc6207
    • S
      powerpc/booke64: Use SPRG7 for VDSO · 9d378dfa
      Scott Wood 提交于
      Previously SPRG3 was marked for use by both VDSO and critical
      interrupts (though critical interrupts were not fully implemented).
      
      In commit 8b64a9df ("powerpc/booke64:
      Use SPRG0/3 scratch for bolted TLB miss & crit int"), Mihai Caraman
      made an attempt to resolve this conflict by restoring the VDSO value
      early in the critical interrupt, but this has some issues:
      
       - It's incompatible with EXCEPTION_COMMON which restores r13 from the
         by-then-overwritten scratch (this cost me some debugging time).
       - It forces critical exceptions to be a special case handled
         differently from even machine check and debug level exceptions.
       - It didn't occur to me that it was possible to make this work at all
         (by doing a final "ld r13, PACA_EXCRIT+EX_R13(r13)") until after
         I made (most of) this patch. :-)
      
      It might be worth investigating using a load rather than SPRG on return
      from all exceptions (except TLB misses where the scratch never leaves
      the SPRG) -- it could save a few cycles.  Until then, let's stick with
      SPRG for all exceptions.
      
      Since we cannot use SPRG4-7 for scratch without corrupting the state of
      a KVM guest, move VDSO to SPRG7 on book3e.  Since neither SPRG4-7 nor
      critical interrupts exist on book3s, SPRG3 is still used for VDSO
      there.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Cc: Mihai Caraman <mihai.caraman@freescale.com>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: kvm-ppc@vger.kernel.org
      9d378dfa
    • S
      powerpc/e6500: Make TLB lock recursive · 82d86de2
      Scott Wood 提交于
      Once special level interrupts are supported, we may take nested TLB
      misses -- so allow the same thread to acquire the lock recursively.
      
      The lock will not be effective against the nested TLB miss handler
      trying to write the same entry as the interrupted TLB miss handler, but
      that's also a problem on non-threaded CPUs that lack TLB write
      conditional.  This will be addressed in the patch that enables crit/mc
      support by invalidating the TLB on return from level exceptions.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      82d86de2
    • W
  6. 11 3月, 2014 1 次提交
  7. 07 3月, 2014 5 次提交
    • S
      powerpc/powernv Platform dump interface · c7e64b9c
      Stewart Smith 提交于
      This enables support for userspace to fetch and initiate FSP and
      Platform dumps from the service processor (via firmware) through sysfs.
      
      Based on original patch from Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
      
      Flow:
        - We register for OPAL notification events.
        - OPAL sends new dump available notification.
        - We make information on dump available via sysfs
        - Userspace requests dump contents
        - We retrieve the dump via OPAL interface
        - User copies the dump data
        - userspace sends ack for dump
        - We send ACK to OPAL.
      
      sysfs files:
        - We add the /sys/firmware/opal/dump directory
        - echoing 1 (well, anything, but in future we may support
          different dump types) to /sys/firmware/opal/dump/initiate_dump
          will initiate a dump.
        - Each dump that we've been notified of gets a directory
          in /sys/firmware/opal/dump/ with a name of the dump type and ID (in hex,
          as this is what's used elsewhere to identify the dump).
        - Each dump has files: id, type, dump and acknowledge
          dump is binary and is the dump itself.
          echoing 'ack' to acknowledge (currently any string will do) will
          acknowledge the dump and it will soon after disappear from sysfs.
      
      OPAL APIs:
        - opal_dump_init()
        - opal_dump_info()
        - opal_dump_read()
        - opal_dump_ack()
        - opal_dump_resend_notification()
      
      Currently we are only ever notified for one dump at a time (until
      the user explicitly acks the current dump, then we get a notification
      of the next dump), but this kernel code should "just work" when OPAL
      starts notifying us of all the dumps present.
      Signed-off-by: NStewart Smith <stewart@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      c7e64b9c
    • S
      powerpc/powernv: Read OPAL error log and export it through sysfs · 774fea1a
      Stewart Smith 提交于
      Based on a patch by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      
      This patch adds support to read error logs from OPAL and export
      them to userspace through a sysfs interface.
      
      We export each log entry as a directory in /sys/firmware/opal/elog/
      
      Currently, OPAL will buffer up to 128 error log records, we don't
      need to have any knowledge of this limit on the Linux side as that
      is actually largely transparent to us.
      
      Each error log entry has the following files: id, type, acknowledge, raw.
      Currently we just export the raw binary error log in the 'raw' attribute.
      In a future patch, we may parse more of the error log to make it a bit
      easier for userspace (e.g. to be able to display a brief summary in
      petitboot without having to have a full parser).
      
      If we have >128 logs from OPAL, we'll only be notified of 128 until
      userspace starts acknowledging them. This limitation may be lifted in
      the future and with this patch, that should "just work" from the linux side.
      
      A userspace daemon should:
      - wait for error log entries using normal mechanisms (we announce creation)
      - read error log entry
      - save error log entry safely to disk
      - acknowledge the error log entry
      - rinse, repeat.
      
      On the Linux side, we read the error log when we're notified of it. This
      possibly isn't ideal as it would be better to only read them on-demand.
      However, this doesn't really work with current OPAL interface, so we
      read the error log immediately when notified at the moment.
      
      I've tested this pretty extensively and am rather confident that the
      linux side of things works rather well. There is currently an issue with
      the service processor side of things for >128 error logs though.
      Signed-off-by: NStewart Smith <stewart@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      774fea1a
    • H
      powerpc/pseries: Update dynamic cache nodes for suspend/resume operation · 6b36ba84
      Haren Myneni 提交于
      pHyp can change cache nodes for suspend/resume operation. Currently the
      device tree is updated by drmgr in userspace after all non boot CPUs are
      enabled. Hence, we do not modify the cache list based on the latest cache
      nodes. Also we do not remove cache entries for the primary CPU.
      
      This patch removes the cache list for the boot CPU, updates the device tree
      before enabling nonboot CPUs and adds cache list for the boot cpu.
      
      This patch also has the side effect that older versions of drmgr will
      perform a second device tree update from userspace. While this is a
      redundant waste of a couple cycles it is harmless since firmware returns the
      same data for the subsequent update-nodes/properties rtas calls.
      Signed-off-by: NHaren Myneni <hbabu@us.ibm.com>
      Signed-off-by: NTyrel Datwyler <tyreld@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      6b36ba84
    • N
      powerpc/pseries: Use remove_memory() to remove memory · 9ac8cde9
      Nathan Fontenot 提交于
      The memory remove code for powerpc/pseries should call remove_memory()
      so that we are holding the hotplug_memory lock during memory remove
      operations.
      
      This patch updates the memory node remove handler to call remove_memory()
      and adds a ppc_md.remove_memory() entry to handle pseries specific work
      that is called from arch_remove_memory().
      
      During memory remove in pseries_remove_memblock() we have to stay with
      removing memory one section at a time. This is needed because of how memory
      resources are handled. During memory add for pseries (via the probe file in
      sysfs) we add memory one section at a time which gives us a memory resource
      for each section. Future patches will aim to address this so will not have
      to remove memory one section at a time.
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      9ac8cde9
    • M
      powerpc/book3s: Recover from MC in sapphire on SCOM read via MMIO. · 55672ecf
      Mahesh Salgaonkar 提交于
      Detect and recover from machine check when inside opal on a special
      scom load instructions. On specific SCOM read via MMIO we may get a machine
      check exception with SRR0 pointing inside opal. To recover from MC
      in this scenario, get a recovery instruction address and return to it from
      MC.
      
      OPAL will export the machine check recoverable ranges through
      device tree node mcheck-recoverable-ranges under ibm,opal:
      
      # hexdump /proc/device-tree/ibm,opal/mcheck-recoverable-ranges
      0000000 0000 0000 3000 2804 0000 000c 0000 0000
      0000010 3000 2814 0000 0000 3000 27f0 0000 000c
      0000020 0000 0000 3000 2814 xxxx xxxx xxxx xxxx
      0000030 llll llll yyyy yyyy yyyy yyyy
      ...
      ...
      #
      
      where:
      	xxxx xxxx xxxx xxxx = Starting instruction address
      	llll llll           = Length of the address range.
      	yyyy yyyy yyyy yyyy = recovery address
      
      Each recoverable address range entry is (start address, len,
      recovery address), 2 cells each for start and recovery address, 1 cell for
      len, totalling 5 cells per entry. During kernel boot time, build up the
      recovery table with the list of recovery ranges from device-tree node which
      will be used during machine check exception to recover from MMIO SCOM UE.
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      55672ecf
  8. 05 3月, 2014 4 次提交
  9. 28 2月, 2014 2 次提交
    • B
      powerpc/powernv: Fix opal_xscom_{read,write} prototype · 2f3f38e4
      Benjamin Herrenschmidt 提交于
      The OPAL firmware functions opal_xscom_read and opal_xscom_write
      take a 64-bit argument for the XSCOM (PCB) address in order to
      support the indirect mode on P8.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: <stable@vger.kernel.org> [v3.13]
      2f3f38e4
    • P
      powerpc: Increase stack redzone for 64-bit userspace to 512 bytes · 573ebfa6
      Paul Mackerras 提交于
      The new ELFv2 little-endian ABI increases the stack redzone -- the
      area below the stack pointer that can be used for storing data --
      from 288 bytes to 512 bytes.  This means that we need to allow more
      space on the user stack when delivering a signal to a 64-bit process.
      
      To make the code a bit clearer, we define new USER_REDZONE_SIZE and
      KERNEL_REDZONE_SIZE symbols in ptrace.h.  For now, we leave the
      kernel redzone size at 288 bytes, since increasing it to 512 bytes
      would increase the size of interrupt stack frames correspondingly.
      
      Gcc currently only makes use of 288 bytes of redzone even when
      compiling for the new little-endian ABI, and the kernel cannot
      currently be compiled with the new ABI anyway.
      
      In the future, hopefully gcc will provide an option to control the
      amount of redzone used, and then we could reduce it even more.
      
      This also changes the code in arch_compat_alloc_user_space() to
      preserve the expanded redzone.  It is not clear why this function would
      ever be used on a 64-bit process, though.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      CC: <stable@vger.kernel.org> [v3.13]
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      573ebfa6
  10. 19 2月, 2014 1 次提交
  11. 17 2月, 2014 2 次提交
    • G
      powerpc/eeh: Cleanup on eeh_subsystem_enabled · 2ec5a0ad
      Gavin Shan 提交于
      The patch cleans up variable eeh_subsystem_enabled so that we needn't
      refer the variable directly from external. Instead, we will use
      function eeh_enabled() and eeh_set_enable() to operate the variable.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      2ec5a0ad
    • A
      powerpc: Link VDSOs at 0x0 · a0a4419e
      Anton Blanchard 提交于
      perf is failing to resolve symbols in the VDSO. A while (1)
      gettimeofday() loop shows:
      
      93.99%  [vdso]  [.] 0x00000000000005e0
       3.12%  test    [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18
       2.81%  test    [.] main
      
      The reason for this is that we are linking our VDSO shared libraries
      at 1MB, which is a little weird. Even though this is uncommon, Alan
      points out that it is valid and we should probably fix perf userspace.
      
      Regardless, I can't see a reason why we are doing this. The code
      is all position independent and we never rely on the VDSO ending
      up at 1M (and we never place it there on 64bit tasks).
      
      Changing our link address to 0x0 fixes perf VDSO symbol resolution:
      
      73.18%  [vdso]  [.] 0x000000000000060c
      12.39%  [vdso]  [.] __kernel_gettimeofday
       3.58%  test    [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18
       2.94%  [vdso]  [.] __kernel_datapage_offset
       2.90%  test    [.] main
      
      We still have some local symbol resolution issues that will be
      fixed in a subsequent patch.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a0a4419e