1. 14 11月, 2016 9 次提交
  2. 30 10月, 2016 1 次提交
  3. 27 10月, 2016 2 次提交
    • N
      powerpc/64s: relocation, register save fixes for system reset interrupt · fb479e44
      Nicholas Piggin 提交于
      This patch does a couple of things. First of all, powernv immediately
      explodes when running a relocated kernel, because the system reset
      exception for handling sleeps does not do correct relocated branches.
      
      Secondly, the sleep handling code trashes the condition and cfar
      registers, which we would like to preserve for debugging purposes (for
      non-sleep case exception).
      
      This patch changes the exception to use the standard format that saves
      registers before any tests or branches are made. It adds the test for
      idle-wakeup as an "extra" to break out of the normal exception path.
      Then it branches to a relocated idle handler that calls the various
      idle handling functions.
      
      After this patch, POWER8 CPU simulator now boots powernv kernel that is
      running at non-zero.
      
      Fixes: 948cf67c ("powerpc: Add NAP mode support on Power7 in HV mode")
      Cc: stable@vger.kernel.org # v3.0+
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Acked-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      fb479e44
    • A
      powerpc/mm/radix: Use tlbiel only if we ever ran on the current cpu · bd77c449
      Aneesh Kumar K.V 提交于
      Before this patch, we used tlbiel, if we ever ran only on this core.
      That was mostly derived from the nohash usage of the same. But is
      incorrect, the ISA 3.0 clarifies tlbiel such that:
      
      "All TLB entries that have all of the following properties are made
      invalid on the thread executing the tlbiel instruction"
      
      ie. tlbiel only invalidates TLB entries on the current thread. So if the
      mm has been used on any other thread (aka. cpu) then we must broadcast
      the invalidate.
      
      This bug could lead to invalid TLB entries if a program runs on multiple
      threads of a core.
      
      Hence use tlbiel, if we only ever ran on only the current cpu.
      
      Fixes: 1a472c9d ("powerpc/mm/radix: Add tlbflush routines")
      Cc: stable@vger.kernel.org # v4.7+
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bd77c449
  4. 22 10月, 2016 1 次提交
  5. 19 10月, 2016 1 次提交
  6. 08 10月, 2016 1 次提交
    • S
      powerpc: implement arch_reserved_kernel_pages · 1e76609c
      Srikar Dronamraju 提交于
      Currently significant amount of memory is reserved only in kernel booted
      to capture kernel dump using the fa_dump method.
      
      Kernels compiled with CONFIG_DEFERRED_STRUCT_PAGE_INIT will initialize
      only certain size memory per node.  The certain size takes into account
      the dentry and inode cache sizes.  Currently the cache sizes are
      calculated based on the total system memory including the reserved
      memory.  However such a kernel when booting the same kernel as fadump
      kernel will not be able to allocate the required amount of memory to
      suffice for the dentry and inode caches.  This results in crashes like
      
      Hence only implement arch_reserved_kernel_pages() for CONFIG_FA_DUMP
      configurations.  The amount reserved will be reduced while calculating
      the large caches and will avoid crashes like the below on large systems
      such as 32 TB systems.
      
        Dentry cache hash table entries: 536870912 (order: 16, 4294967296 bytes)
        vmalloc: allocation failure, allocated 4097114112 of 17179934720 bytes
        swapper/0: page allocation failure: order:0, mode:0x2080020(GFP_ATOMIC)
        CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6-master+ #3
        Call Trace:
           dump_stack+0xb0/0xf0 (unreliable)
           warn_alloc_failed+0x114/0x160
           __vmalloc_node_range+0x304/0x340
           __vmalloc+0x6c/0x90
           alloc_large_system_hash+0x1b8/0x2c0
           inode_init+0x94/0xe4
           vfs_caches_init+0x8c/0x13c
           start_kernel+0x50c/0x578
           start_here_common+0x20/0xa8
      
      Link: http://lkml.kernel.org/r/1472476010-4709-4-git-send-email-srikar@linux.vnet.ibm.comSigned-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Suggested-by: NMel Gorman <mgorman@techsingularity.net>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1e76609c
  7. 04 10月, 2016 13 次提交
    • N
      powerpc/bpf: Implement support for tail calls · ce076141
      Naveen N. Rao 提交于
      Tail calls allow JIT'ed eBPF programs to call into other JIT'ed eBPF
      programs. This can be achieved either by:
      (1) retaining the stack setup by the first eBPF program and having all
      subsequent eBPF programs re-using it, or,
      (2) by unwinding/tearing down the stack and having each eBPF program
      deal with its own stack as it sees fit.
      
      To ensure that this does not create loops, there is a limit to how many
      tail calls can be done (currently 32). This requires the JIT'ed code to
      maintain a count of the number of tail calls done so far.
      
      Approach (1) is simple, but requires every eBPF program to have (almost)
      the same prologue/epilogue, regardless of whether they need it. This is
      inefficient for small eBPF programs which may not sometimes need a
      prologue at all. As such, to minimize impact of tail call
      implementation, we use approach (2) here which needs each eBPF program
      in the chain to use its own prologue/epilogue. This is not ideal when
      many tail calls are involved and when all the eBPF programs in the chain
      have similar prologue/epilogue. However, the impact is restricted to
      programs that do tail calls. Individual eBPF programs are not affected.
      
      We maintain the tail call count in a fixed location on the stack and
      updated tail call count values are passed in through this. The very
      first eBPF program in a chain sets this up to 0 (the first 2
      instructions). Subsequent tail calls skip the first two eBPF JIT
      instructions to maintain the count. For programs that don't do tail
      calls themselves, the first two instructions are NOPs.
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ce076141
    • C
      powerpc: tm: Enable transactional memory (TM) lazily for userspace · 5d176f75
      Cyril Bur 提交于
      Currently the MSR TM bit is always set if the hardware is TM capable.
      This adds extra overhead as it means the TM SPRS (TFHAR, TEXASR and
      TFAIR) must be swapped for each process regardless of if they use TM.
      
      For processes that don't use TM the TM MSR bit can be turned off
      allowing the kernel to avoid the expensive swap of the TM registers.
      
      A TM unavailable exception will occur if a thread does use TM and the
      kernel will enable MSR_TM and leave it so for some time afterwards.
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5d176f75
    • C
      powerpc: Remove do_load_up_transact_{fpu,altivec} · d986d6f4
      Cyril Bur 提交于
      Previous rework of TM code leaves these functions unused
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d986d6f4
    • C
      powerpc: tm: Rename transct_(*) to ck(\1)_state · 000ec280
      Cyril Bur 提交于
      Make the structures being used for checkpointed state named
      consistently with the pt_regs/ckpt_regs.
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      000ec280
    • C
      powerpc: tm: Always use fp_state and vr_state to store live registers · dc310669
      Cyril Bur 提交于
      There is currently an inconsistency as to how the entire CPU register
      state is saved and restored when a thread uses transactional memory
      (TM).
      
      Using transactional memory results in the CPU having duplicated
      (almost) all of its register state. This duplication results in a set
      of registers which can be considered 'live', those being currently
      modified by the instructions being executed and another set that is
      frozen at a point in time.
      
      On context switch, both sets of state have to be saved and (later)
      restored. These two states are often called a variety of different
      things. Common terms for the state which only exists after the CPU has
      entered a transaction (performed a TBEGIN instruction) in hardware are
      'transactional' or 'speculative'.
      
      Between a TBEGIN and a TEND or TABORT (or an event that causes the
      hardware to abort), regardless of the use of TSUSPEND the
      transactional state can be referred to as the live state.
      
      The second state is often to referred to as the 'checkpointed' state
      and is a duplication of the live state when the TBEGIN instruction is
      executed. This state is kept in the hardware and will be rolled back
      to on transaction failure.
      
      Currently all the registers stored in pt_regs are ALWAYS the live
      registers, that is, when a thread has transactional registers their
      values are stored in pt_regs and the checkpointed state is in
      ckpt_regs. A strange opposite is true for fp_state/vr_state. When a
      thread is non transactional fp_state/vr_state holds the live
      registers. When a thread has initiated a transaction fp_state/vr_state
      holds the checkpointed state and transact_fp/transact_vr become the
      structure which holds the live state (at this point it is a
      transactional state).
      
      This method creates confusion as to where the live state is, in some
      circumstances it requires extra work to determine where to put the
      live state and prevents the use of common functions designed (probably
      before TM) to save the live state.
      
      With this patch pt_regs, fp_state and vr_state all represent the
      same thing and the other structures [pending rename] are for
      checkpointed state.
      Acked-by: NSimon Guo <wei.guo.simon@gmail.com>
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      dc310669
    • C
      powerpc: signals: Stop using current in signal code · d1199431
      Cyril Bur 提交于
      Much of the signal code takes a pt_regs on which it operates. Over
      time the signal code has needed to know more about the thread than
      what pt_regs can supply, this information is obtained as needed by
      using 'current'.
      
      This approach is not strictly incorrect however it does mean that
      there is now a hard requirement that the pt_regs being passed around
      does belong to current, this is never checked. A safer approach is for
      the majority of the signal functions to take a task_struct from which
      they can obtain pt_regs and any other information they need. The
      caveat that the task_struct they are passed must be current doesn't go
      away but can more easily be checked for.
      
      Functions called from outside powerpc signal code are passed a pt_regs
      and they can confirm that the pt_regs is that of current and pass
      current to other functions, furthurmore, powerpc signal functions can
      check that the task_struct they are passed is the same as current
      avoiding possible corruption of current (or the task they are passed)
      if this assertion ever fails.
      
      CC: paulus@samba.org
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d1199431
    • C
      powerpc: Return the new MSR from msr_check_and_set() · 3cee070a
      Cyril Bur 提交于
      msr_check_and_set() always performs a mfmsr() to determine if it needs
      to perform an mtmsr(), as mfmsr() can be a costly operation
      msr_check_and_set() could return the MSR now on the CPU to avoid
      callers of msr_check_and_set having to make their own mfmsr() call.
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3cee070a
    • G
      powerpc/powernv: Specify proper data type for PCI_SLOT_ID_PREFIX · 066bcd78
      Gavin Shan 提交于
      This fixes the warning reported from sparse:
      
        eeh-powernv.c:875:23: warning: constant 0x8000000000000000 is so big it is unsigned long
      
      Fixes: ebe22531 ("powerpc/powernv: Support PCI slot ID")
      Suggested-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      066bcd78
    • A
      powerpc: Remove static branch prediction in atomic{, 64}_add_unless · 61e98ebf
      Anton Blanchard 提交于
      I see quite a lot of static branch mispredictions on a simple
      web serving workload. The issue is in __atomic_add_unless(), called
      from _atomic_dec_and_lock(). There is no obvious common case, so it
      is better to let the hardware predict the branch.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      61e98ebf
    • A
      powerpc: During context switch, check before setting mm_cpumask · bb85fb58
      Anton Blanchard 提交于
      During context switch, switch_mm() sets our current CPU in mm_cpumask.
      We can avoid this atomic sequence in most cases by checking before
      setting the bit.
      
      Testing on a POWER8 using our context switch microbenchmark:
      
      tools/testing/selftests/powerpc/benchmarks/context_switch \
      	--process --no-fp --no-altivec --no-vector
      
      Performance improves 2%.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bb85fb58
    • N
      powerpc: Use gas sections for arranging exception vectors · 57f26649
      Nicholas Piggin 提交于
      Use assembler sections of fixed size and location to arrange the 64-bit
      Book3S exception vector code (64-bit Book3E also uses it in head_64.S
      for 0x0..0x100).
      
      This allows better flexibility in arranging exception code and hiding
      unimportant details behind macros.
      
      Gas sections can be a bit painful to use this way, mainly because the
      assembler does not know where they will be finally linked. Taking
      absolute addresses requires a bit of trickery for example, but it can
      be hidden behind macros for the most part.
      
      Generated code is mostly the same except locations, offsets, alignments.
      
      The "+ 0x2" is only required for the trap number / kvm exit number,
      which gets loaded as a constant into a register.
      
      Previously, code also used + 0x2 for label names, but we changed to
      using "H" to distinguish HV case for that. Remove the last vestiges
      of that.
      
      __after_prom_start is taking absolute address of a label in another
      fixed section. Newer toolchains seemed to compile this okay, but older
      ones do not. FIXED_SYMBOL_ABS_ADDR is more foolproof, it just takes an
      additional line to define.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      57f26649
    • N
      powerpc/64s: Consolidate exception handler alignment · be642c34
      Nicholas Piggin 提交于
      Move exception handler alignment directives into the head-64.h macros,
      beause they will no longer work in-place after the next patch. This
      slightly changes functions that have alignments applied and therefore
      code generation, which is why it was not done initially (see earlier
      patch).
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      be642c34
    • M
      powerpc/64s: Add new exception vector macros · da2bc464
      Michael Ellerman 提交于
      Create arch/powerpc/include/asm/head-64.h with macros that specify
      an exception vector (name, type, location), which will be used to
      label and lay out exceptions into the object file.
      
      Naming is moved out of exception-64s.h, which is used to specify the
      implementation of exception handlers.
      
      objdump of generated code in exception vectors is unchanged except for
      names. Alignment directives scattered around are annoying, but done
      this way so that disassembly can verify identical instruction
      generation before and after patch. These get cleaned up in future
      patch.
      
      We change the way KVMTEST works, explicitly passing EXC_HV or EXC_STD
      rather than overloading the trap number. This removes the need to have
      SOFTEN values for the overloaded trap numbers, eg. 0x502.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      da2bc464
  8. 29 9月, 2016 2 次提交
    • B
      KVM: PPC: Book3S HV: Migrate pinned pages out of CMA · 2e5bbb54
      Balbir Singh 提交于
      When PCI Device pass-through is enabled via VFIO, KVM-PPC will
      pin pages using get_user_pages_fast(). One of the downsides of
      the pinning is that the page could be in CMA region. The CMA
      region is used for other allocations like the hash page table.
      Ideally we want the pinned pages to be from non CMA region.
      
      This patch (currently only for KVM PPC with VFIO) forcefully
      migrates the pages out (huge pages are omitted for the moment).
      There are more efficient ways of doing this, but that might
      be elaborate and might impact a larger audience beyond just
      the kvm ppc implementation.
      
      The magic is in new_iommu_non_cma_page() which allocates the
      new page from a non CMA region.
      
      I've tested the patches lightly at my end. The full solution
      requires migration of THP pages in the CMA region. That work
      will be done incrementally on top of this.
      Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
      Acked-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      [mpe: Merged via powerpc tree as that's where the changes are]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      2e5bbb54
    • G
      drivers/pci/hotplug: Support surprise hotplug in powernv driver · 360aebd8
      Gavin Shan 提交于
      This supports PCI surprise hotplug. The design is highlighted as
      below:
      
         * The PCI slot's surprise hotplug capability is exposed through
           device node property "ibm,slot-surprise-pluggable", meaning
           PCI surprise hotplug will be disabled if skiboot doesn't support
           it yet.
         * The interrupt because of presence or link state change is raised
           on surprise hotplug event. One event is allocated and queued to
           the PCI slot for workqueue to pick it up and process in serialized
           fashion. The code flow for surprise hotplug is same to that for
           managed hotplug except: the affected PEs are put into frozen state
           to avoid unexpected EEH error reporting in surprise hot remove path.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      360aebd8
  9. 27 9月, 2016 3 次提交
    • T
      KVM: PPC: Book3s PR: Allow access to unprivileged MMCR2 register · fa73c3b2
      Thomas Huth 提交于
      The MMCR2 register is available twice, one time with number 785
      (privileged access), and one time with number 769 (unprivileged,
      but it can be disabled completely). In former times, the Linux
      kernel was using the unprivileged register 769 only, but since
      commit 8dd75ccb ("powerpc: Use privileged SPR number
      for MMCR2"), it uses the privileged register 785 instead.
      The KVM-PR code then of course also switched to use the SPR 785,
      but this is causing older guest kernels to crash, since these
      kernels still access 769 instead. So to support older kernels
      with KVM-PR again, we have to support register 769 in KVM-PR, too.
      
      Fixes: 8dd75ccb
      Cc: stable@vger.kernel.org # v3.10+
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      fa73c3b2
    • B
      KVM: PPC: Book3S: Remove duplicate setting of the B field in tlbie · 4f053d06
      Balbir Singh 提交于
      Remove duplicate setting of the the "B" field when doing a tlbie(l).
      In compute_tlbie_rb(), the "B" field is set again just before
      returning the rb value to be used for tlbie(l).
      Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      4f053d06
    • P
      KVM: PPC: Book3S: Treat VTB as a per-subcore register, not per-thread · 88b02cf9
      Paul Mackerras 提交于
      POWER8 has one virtual timebase (VTB) register per subcore, not one
      per CPU thread.  The HV KVM code currently treats VTB as a per-thread
      register, which can lead to spurious soft lockup messages from guests
      which use the VTB as the time source for the soft lockup detector.
      (CPUs before POWER8 did not have the VTB register.)
      
      For HV KVM, this fixes the problem by making only the primary thread
      in each virtual core save and restore the VTB value.  With this,
      the VTB state becomes part of the kvmppc_vcore structure.  This
      also means that "piggybacking" of multiple virtual cores onto one
      subcore is not possible on POWER8, because then the virtual cores
      would share a single VTB register.
      
      PR KVM emulates a VTB register, which is per-vcpu because PR KVM
      has no notion of CPU threads or SMT.  For PR KVM we move the VTB
      state into the kvmppc_vcpu_book3s struct.
      
      Cc: stable@vger.kernel.org # v3.14+
      Reported-by: NThomas Huth <thuth@redhat.com>
      Tested-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      88b02cf9
  10. 25 9月, 2016 3 次提交
  11. 23 9月, 2016 4 次提交