1. 03 11月, 2014 1 次提交
    • A
      powerpc: Convert power off logic to pm_power_off · 9178ba29
      Alexander Graf 提交于
      The generic Linux framework to power off the machine is a function pointer
      called pm_power_off. The trick about this pointer is that device drivers can
      potentially implement it rather than board files.
      
      Today on powerpc we set pm_power_off to invoke our generic full machine power
      off logic which then calls ppc_md.power_off to invoke machine specific power
      off.
      
      However, when we want to add a power off GPIO via the "gpio-poweroff" driver,
      this card house falls apart. That driver only registers itself if pm_power_off
      is NULL to ensure it doesn't override board specific logic. However, since we
      always set pm_power_off to the generic power off logic (which will just not
      power off the machine if no ppc_md.power_off call is implemented), we can't
      implement power off via the generic GPIO power off driver.
      
      To fix this up, let's get rid of the ppc_md.power_off logic and just always use
      pm_power_off as was intended. Then individual drivers such as the GPIO power off
      driver can implement power off logic via that function pointer.
      
      With this patch set applied and a few patches on top of QEMU that implement a
      power off GPIO on the virt e500 machine, I can successfully turn off my virtual
      machine after halt.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      [mpe: Squash into one patch and update changelog based on cover letter]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9178ba29
  2. 22 10月, 2014 2 次提交
    • P
      powerpc: Wire up sys_bpf() syscall · fcbb539f
      Pranith Kumar 提交于
      This patch wires up the new syscall sys_bpf() on powerpc.
      
      Passes the tests in samples/bpf:
      
          #0 add+sub+mul OK
          #1 unreachable OK
          #2 unreachable2 OK
          #3 out of range jump OK
          #4 out of range jump2 OK
          #5 test1 ld_imm64 OK
          #6 test2 ld_imm64 OK
          #7 test3 ld_imm64 OK
          #8 test4 ld_imm64 OK
          #9 test5 ld_imm64 OK
          #10 no bpf_exit OK
          #11 loop (back-edge) OK
          #12 loop2 (back-edge) OK
          #13 conditional loop OK
          #14 read uninitialized register OK
          #15 read invalid register OK
          #16 program doesn't init R0 before exit OK
          #17 stack out of bounds OK
          #18 invalid call insn1 OK
          #19 invalid call insn2 OK
          #20 invalid function call OK
          #21 uninitialized stack1 OK
          #22 uninitialized stack2 OK
          #23 check valid spill/fill OK
          #24 check corrupted spill/fill OK
          #25 invalid src register in STX OK
          #26 invalid dst register in STX OK
          #27 invalid dst register in ST OK
          #28 invalid src register in LDX OK
          #29 invalid dst register in LDX OK
          #30 junk insn OK
          #31 junk insn2 OK
          #32 junk insn3 OK
          #33 junk insn4 OK
          #34 junk insn5 OK
          #35 misaligned read from stack OK
          #36 invalid map_fd for function call OK
          #37 don't check return value before access OK
          #38 access memory with incorrect alignment OK
          #39 sometimes access memory with incorrect alignment OK
          #40 jump test 1 OK
          #41 jump test 2 OK
          #42 jump test 3 OK
          #43 jump test 4 OK
      Signed-off-by: NPranith Kumar <bobby.prani@gmail.com>
      [mpe: test using samples/bpf]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      fcbb539f
    • A
      powerpc/mm: Remove redundant #if case · ca5f1d16
      Aneesh Kumar K.V 提交于
      Remove the check of CONFIG_PPC_SUBPAGE_PROT when deciding if
      is_hugepage_only_range() is extern or inline. The extern version is in
      slice.c and is built if CONFIG_PPC_MM_SLICES=y.
      
      There was no build break possible because CONFIG_PPC_SUBPAGE_PROT is
      only selectable under conditions which also mean CONFIG_PPC_MM_SLICES
      will be selected.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ca5f1d16
  3. 15 10月, 2014 4 次提交
    • G
      powerpc/eeh: Block PCI config access upon frozen PE · b6541db1
      Gavin Shan 提交于
      The problem was found when I tried to inject PCI config error by
      PHB3 PAPR error injection registers into Broadcom Austin 4-ports
      NIC adapter. The frozen PE was reported successfully and EEH core
      started to recover it. However, I run into fenced PHB when dumping
      PCI config space as EEH logs. I was told that PCI config requests
      should not be progagated to the adapter until PE reset is done
      successfully. Otherise, we would run out of PHB internal credits
      and trigger PCT (PCIE Completion Timeout), which leads to the
      fenced PHB.
      
      The patch introduces another PE flag EEH_PE_CFG_RESTRICTED, which
      is set during PE initialization time if the PE includes the specific
      PCI devices that need block PCI config access until PE reset is done.
      When the PE becomes frozen for the first time, EEH_PE_CFG_BLOCKED is
      set if the PE has flag EEH_PE_CFG_RESTRICTED. Then the PCI config
      access to the PE will be dropped by platform PCI accessors until
      PE reset is done successfully. The mechanism is shared by PowerNV
      platform owned PE or userland owned ones. It's not used on pSeries
      platform yet.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b6541db1
    • G
      powerpc/eeh: Rename flag EEH_PE_RESET to EEH_PE_CFG_BLOCKED · 8a6b3710
      Gavin Shan 提交于
      The flag EEH_PE_RESET indicates blocking config space of the PE
      during reset time. We potentially need block PE's config space
      other than reset time. So it's reasonable to replace it with
      EEH_PE_CFG_BLOCKED to indicate its usage.
      
      There are no substantial code or logic changes in this patch.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8a6b3710
    • A
      powerpc: Rename __get_SP() to current_stack_pointer() · acf620ec
      Anton Blanchard 提交于
      Michael points out that __get_SP() is a pretty horrible
      function name. Let's give it a better name.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      acf620ec
    • A
      powerpc: Reimplement __get_SP() as a function not a define · bfe9a2cf
      Anton Blanchard 提交于
      Li Zhong points out an issue with our current __get_SP()
      implementation. If ftrace function tracing is enabled (ie -pg
      profiling using _mcount) we spill a stack frame on 64bit all the
      time.
      
      If a function calls __get_SP() and later calls a function that is
      tail call optimised, we will pop the stack frame and the value
      returned by __get_SP() is no longer valid. An example from Li can
      be found in save_stack_trace -> save_context_stack:
      
      c0000000000432c0 <.save_stack_trace>:
      c0000000000432c0:       mflr    r0
      c0000000000432c4:       std     r0,16(r1)
      c0000000000432c8:       stdu    r1,-128(r1) <-- stack frame for _mcount
      c0000000000432cc:       std     r3,112(r1)
      c0000000000432d0:       bl      <._mcount>
      c0000000000432d4:       nop
      
      c0000000000432d8:       mr      r4,r1 <-- __get_SP()
      
      c0000000000432dc:       ld      r5,632(r13)
      c0000000000432e0:       ld      r3,112(r1)
      c0000000000432e4:       li      r6,1
      
      c0000000000432e8:       addi    r1,r1,128 <-- pop stack frame
      
      c0000000000432ec:       ld      r0,16(r1)
      c0000000000432f0:       mtlr    r0
      c0000000000432f4:       b       <.save_context_stack> <-- tail call optimized
      
      save_context_stack ends up with a stack pointer below the current
      one, and it is likely to be scribbled over.
      
      Fix this by making __get_SP() a function which returns the
      callers stack frame. Also replace inline assembly which grabs
      the stack pointer in save_stack_trace and show_stack with
      __get_SP().
      
      This also fixes an issue with perf_arch_fetch_caller_regs().
      It currently unwinds the stack once, which will skip a
      valid stack frame on a leaf function. With the __get_SP() fixes
      in this patch, we never need to unwind the stack frame to get
      to the first interesting frame.
      
      We have to export __get_SP() because perf_arch_fetch_caller_regs()
      (which is used in modules) calls it from a header file.
      Reported-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bfe9a2cf
  4. 10 10月, 2014 2 次提交
    • R
      powerpc: Fix sys_call_table declaration to enable syscall tracing · 1028ccf5
      Romeo Cane 提交于
      Declaring sys_call_table as a pointer causes the compiler to generate
      the wrong lookup code in arch_syscall_addr().
      
           <arch_syscall_addr>:
              lis     r9,-16384
              rlwinm  r3,r3,2,0,29
        -     lwz     r11,30640(r9)
        -     lwzx    r3,r11,r3
        +     addi    r9,r9,30640
        +     lwzx    r3,r9,r3
              blr
      
      The actual sys_call_table symbol, declared in assembler, is an
      array. If we lie about that to the compiler we get the wrong code
      generated, as above.
      
      This definition seems only to be used by the syscall tracing code in
      kernel/trace/trace_syscalls.c. With this patch I can successfully use
      the syscall tracepoints:
      
        bash-3815  [002] ....   333.239082: sys_write -> 0x2
        bash-3815  [002] ....   333.239087: sys_dup2(oldfd: a, newfd: 1)
        bash-3815  [002] ....   333.239088: sys_dup2 -> 0x1
        bash-3815  [002] ....   333.239092: sys_fcntl(fd: a, cmd: 1, arg: 0)
        bash-3815  [002] ....   333.239093: sys_fcntl -> 0x1
        bash-3815  [002] ....   333.239094: sys_close(fd: a)
        bash-3815  [002] ....   333.239094: sys_close -> 0x0
      Signed-off-by: NRomeo Cane <romeo.cane.ext@coriant.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1028ccf5
    • M
      mm: remove misleading ARCH_USES_NUMA_PROT_NONE · 6a33979d
      Mel Gorman 提交于
      ARCH_USES_NUMA_PROT_NONE was defined for architectures that implemented
      _PAGE_NUMA using _PROT_NONE.  This saved using an additional PTE bit and
      relied on the fact that PROT_NONE vmas were skipped by the NUMA hinting
      fault scanner.  This was found to be conceptually confusing with a lot of
      implicit assumptions and it was asked that an alternative be found.
      
      Commit c46a7c81 "x86: define _PAGE_NUMA by reusing software bits on the
      PMD and PTE levels" redefined _PAGE_NUMA on x86 to be one of the swap PTE
      bits and shrunk the maximum possible swap size but it did not go far
      enough.  There are no architectures that reuse _PROT_NONE as _PROT_NUMA
      but the relics still exist.
      
      This patch removes ARCH_USES_NUMA_PROT_NONE and removes some unnecessary
      duplication in powerpc vs the generic implementation by defining the types
      the core NUMA helpers expected to exist from x86 with their ppc64
      equivalent.  This necessitated that a PTE bit mask be created that
      identified the bits that distinguish present from NUMA pte entries but it
      is expected this will only differ between arches based on _PAGE_PROTNONE.
      The naming for the generic helpers was taken from x86 originally but ppc64
      has types that are equivalent for the purposes of the helper so they are
      mapped instead of duplicating code.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6a33979d
  5. 08 10月, 2014 6 次提交
  6. 07 10月, 2014 1 次提交
  7. 03 10月, 2014 1 次提交
  8. 02 10月, 2014 4 次提交
  9. 30 9月, 2014 11 次提交
  10. 25 9月, 2014 7 次提交
    • W
      powerpc/eeh: Fix kernel crash when passing through VF · 2a58222f
      Wei Yang 提交于
      When doing vfio passthrough a VF, the kernel will crash with following
      message:
      
      [  442.656459] Unable to handle kernel paging request for data at address 0x00000060
      [  442.656593] Faulting instruction address: 0xc000000000038b88
      [  442.656706] Oops: Kernel access of bad area, sig: 11 [#1]
      [  442.656798] SMP NR_CPUS=1024 NUMA PowerNV
      [  442.656890] Modules linked in: vfio_pci mlx4_core nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT xt_conntrack bnep bluetooth rfkill ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw tg3 nfsd be2net nfs_acl ses lockd ptp enclosure pps_core kvm_hv kvm_pr shpchp binfmt_misc kvm sunrpc uinput lpfc scsi_transport_fc ipr scsi_tgt [last unloaded: mlx4_core]
      [  442.658152] CPU: 40 PID: 14948 Comm: qemu-system-ppc Not tainted 3.10.42yw-pkvm+ #37
      [  442.658219] task: c000000f7e2a9a00 ti: c000000f6dc3c000 task.ti: c000000f6dc3c000
      [  442.658287] NIP: c000000000038b88 LR: c0000000004435a8 CTR: c000000000455bc0
      [  442.658352] REGS: c000000f6dc3f580 TRAP: 0300   Not tainted  (3.10.42yw-pkvm+)
      [  442.658419] MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI>  CR: 28004882  XER: 20000000
      [  442.658577] CFAR: c00000000000908c DAR: 0000000000000060 DSISR: 40000000 SOFTE: 1
      GPR00: c0000000004435a8 c000000f6dc3f800 c0000000012b1c10 c00000000da24000
      GPR04: 0000000000000003 0000000000001004 00000000000015b3 000000000000ffff
      GPR08: c00000000127f5d8 0000000000000000 000000000000ffff 0000000000000000
      GPR12: c000000000068078 c00000000fdd6800 000001003c320c80 000001003c3607f0
      GPR16: 0000000000000001 00000000105480c8 000000001055aaa8 000001003c31ab18
      GPR20: 000001003c10fb40 000001003c360ae8 000000001063bcf0 000000001063bdb0
      GPR24: 000001003c15ed70 0000000010548f40 c000001fe5514c88 c000001fe5514cb0
      GPR28: c00000000da24000 0000000000000000 c00000000da24000 0000000000000003
      [  442.659471] NIP [c000000000038b88] .pcibios_set_pcie_reset_state+0x28/0x130
      [  442.659530] LR [c0000000004435a8] .pci_set_pcie_reset_state+0x28/0x40
      [  442.659585] Call Trace:
      [  442.659610] [c000000f6dc3f800] [00000000000719e0] 0x719e0 (unreliable)
      [  442.659677] [c000000f6dc3f880] [c0000000004435a8] .pci_set_pcie_reset_state+0x28/0x40
      [  442.659757] [c000000f6dc3f900] [c000000000455bf8] .reset_fundamental+0x38/0x80
      [  442.659835] [c000000f6dc3f980] [c0000000004562a8] .pci_dev_specific_reset+0xa8/0xf0
      [  442.659913] [c000000f6dc3fa00] [c0000000004448c4] .__pci_dev_reset+0x44/0x430
      [  442.659980] [c000000f6dc3fab0] [c000000000444d5c] .pci_reset_function+0x7c/0xc0
      [  442.660059] [c000000f6dc3fb30] [d00000001c141ab8] .vfio_pci_open+0xe8/0x2b0 [vfio_pci]
      [  442.660139] [c000000f6dc3fbd0] [c000000000586c30] .vfio_group_fops_unl_ioctl+0x3a0/0x630
      [  442.660219] [c000000f6dc3fc90] [c000000000255fbc] .do_vfs_ioctl+0x4ec/0x7c0
      [  442.660286] [c000000f6dc3fd80] [c000000000256364] .SyS_ioctl+0xd4/0xf0
      [  442.660354] [c000000f6dc3fe30] [c000000000009e54] syscall_exit+0x0/0x98
      [  442.660420] Instruction dump:
      [  442.660454] 4bfffce9 4bfffee4 7c0802a6 fbc1fff0 fbe1fff8 f8010010 f821ff81 7c7e1b78
      [  442.660566] 7c9f2378 60000000 60000000 e93e02c8 <e8690060> 2fa30000 41de00c4 2b9f0002
      [  442.660679] ---[ end trace a64ac9546bcf0328 ]---
      [  442.660724]
      
      The reason is current VF is not EEH enabled.
      
      This patch introduces a macro to convert eeh_dev to eeh_pe. By doing so, it
      will prevent converting with NULL pointer.
      Signed-off-by: NWei Yang <weiyang@linux.vnet.ibm.com>
      Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      CC: Michael Ellerman <mpe@ellerman.id.au>
      
      V3 -> V4:
         1. move the macro definition from include/linux/pci.h to
            arch/powerpc/include/asm/eeh.h
      
      V2 -> V3:
         1. rebased on 3.17-rc4
         2. introduce a macro
         3. use this macro in several other places
      
      V1 -> V2:
         1. code style and patch subject adjustment
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      2a58222f
    • P
      powerpc: Emulate icbi, mcrf and conditional-trap instructions · cf87c3f6
      Paul Mackerras 提交于
      This extends the instruction emulation done by analyse_instr() and
      emulate_step() to handle a few more instructions that are found in
      the kernel.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      cf87c3f6
    • P
      powerpc: Split out instruction analysis part of emulate_step() · be96f633
      Paul Mackerras 提交于
      This splits out the instruction analysis part of emulate_step() into
      a separate analyse_instr() function, which decodes the instruction,
      but doesn't execute any load or store instructions.  It does execute
      integer instructions and branches which can be executed purely by
      updating register values in the pt_regs struct.  For other instructions,
      it returns the instruction type and other details in a new
      instruction_op struct.  emulate_step() then uses that information
      to execute loads, stores, cache operations, mfmsr, mtmsr[d], and
      (on 64-bit) sc instructions.
      
      The reason for doing this is so that the KVM code can use it instead
      of having its own separate instruction emulation code.  Possibly the
      alignment interrupt handler could also use this.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      be96f633
    • P
      powerpc/powernv: Don't call generic code on offline cpus · d6a4f709
      Paul Mackerras 提交于
      On PowerNV platforms, when a CPU is offline, we put it into nap mode.
      It's possible that the CPU wakes up from nap mode while it is still
      offline due to a stray IPI.  A misdirected device interrupt could also
      potentially cause it to wake up.  In that circumstance, we need to clear
      the interrupt so that the CPU can go back to nap mode.
      
      In the past the clearing of the interrupt was accomplished by briefly
      enabling interrupts and allowing the normal interrupt handling code
      (do_IRQ() etc.) to handle the interrupt.  This has the problem that
      this code calls irq_enter() and irq_exit(), which call functions such
      as account_system_vtime() which use RCU internally.  Use of RCU is not
      permitted on offline CPUs and will trigger errors if RCU checking is
      enabled.
      
      To avoid calling into any generic code which might use RCU, we adopt
      a different method of clearing interrupts on offline CPUs.  Since we
      are on the PowerNV platform, we know that the system interrupt
      controller is a XICS being driven directly (i.e. not via hcalls) by
      the kernel.  Hence this adds a new icp_native_flush_interrupt()
      function to the native-mode XICS driver and arranges to call that
      when an offline CPU is woken from nap.  This new function reads the
      interrupt from the XICS.  If it is an IPI, it clears the IPI; if it
      is a device interrupt, it prints a warning and disables the source.
      Then it does the end-of-interrupt processing for the interrupt.
      
      The other thing that briefly enabling interrupts did was to check and
      clear the irq_happened flag in this CPU's PACA.  Therefore, after
      flushing the interrupt from the XICS, we also clear all bits except
      the PACA_IRQ_HARD_DIS (interrupts are hard disabled) bit from the
      irq_happened flag.  The PACA_IRQ_HARD_DIS flag is set by power7_nap()
      and is left set to indicate that interrupts are hard disabled.  This
      means we then have to ignore that flag in power7_nap(), which is
      reasonable since it doesn't indicate that any interrupt event needs
      servicing.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d6a4f709
    • A
      powerpc: Move htab_remove_mapping function prototype into header file · f6026df1
      Anton Blanchard 提交于
      A recent patch added a function prototype for htab_remove_mapping in
      c code. Fix it.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f6026df1
    • A
      powerpc: Remove stale function prototypes · a38efcea
      Anton Blanchard 提交于
      There were a number of prototypes for functions that no longer
      exist. Remove them.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a38efcea
    • M
      powerpc/powernv: Add OPAL check token call · bffe6bda
      Michael Neuling 提交于
      Currently there is no way to generically check if an OPAL call exists or not
      from the host kernel.
      
      This adds an OPAL call opal_check_token() which tells you if the given token is
      present in OPAL or not.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bffe6bda
  11. 24 9月, 2014 1 次提交