1. 17 2月, 2014 9 次提交
    • G
      powerpc/eeh: Cleanup on eeh_subsystem_enabled · 2ec5a0ad
      Gavin Shan 提交于
      The patch cleans up variable eeh_subsystem_enabled so that we needn't
      refer the variable directly from external. Instead, we will use
      function eeh_enabled() and eeh_set_enable() to operate the variable.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      2ec5a0ad
    • G
      powerpc/powernv: Rework EEH reset · 5b2e198e
      Gavin Shan 提交于
      When doing reset in order to recover the affected PE, we issue
      hot reset on PE primary bus if it's not root bus. Otherwise, we
      issue hot or fundamental reset on root port or PHB accordingly.
      For the later case, we didn't cover the situation where PE only
      includes root port and it potentially causes kernel crash upon
      EEH error to the PE.
      
      The patch reworks the logic of EEH reset to improve the code
      readability and also avoid the kernel crash.
      
      Cc: stable@vger.kernel.org
      Reported-by: NThadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      5b2e198e
    • A
      powerpc: Use unstripped VDSO image for more accurate profiling data · 24b659a1
      Anton Blanchard 提交于
      We are seeing a lot of hits in the VDSO that are not resolved by perf.
      A while(1) gettimeofday() loop shows the issue:
      
      27.64%  [vdso]  [.] 0x000000000000060c
      22.57%  [vdso]  [.] 0x0000000000000628
      16.88%  [vdso]  [.] 0x0000000000000610
      12.39%  [vdso]  [.] __kernel_gettimeofday
       6.09%  [vdso]  [.] 0x00000000000005f8
       3.58%  test    [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18
       2.94%  [vdso]  [.] __kernel_datapage_offset
       2.90%  test    [.] main
      
      We are using a stripped VDSO image which means only symbols with
      relocation info can be resolved. There isn't a lot of point to
      stripping the VDSO, the debug info is only about 1kB:
      
      4680 arch/powerpc/kernel/vdso64/vdso64.so
      5815 arch/powerpc/kernel/vdso64/vdso64.so.dbg
      
      By using the unstripped image, we can resolve all the symbols in the
      VDSO and the perf profile data looks much better:
      
      76.53%  [vdso]  [.] __do_get_tspec
      12.20%  [vdso]  [.] __kernel_gettimeofday
       5.05%  [vdso]  [.] __get_datapage
       3.20%  test    [.] main
       2.92%  test    [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      24b659a1
    • A
      powerpc: Link VDSOs at 0x0 · a0a4419e
      Anton Blanchard 提交于
      perf is failing to resolve symbols in the VDSO. A while (1)
      gettimeofday() loop shows:
      
      93.99%  [vdso]  [.] 0x00000000000005e0
       3.12%  test    [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18
       2.81%  test    [.] main
      
      The reason for this is that we are linking our VDSO shared libraries
      at 1MB, which is a little weird. Even though this is uncommon, Alan
      points out that it is valid and we should probably fix perf userspace.
      
      Regardless, I can't see a reason why we are doing this. The code
      is all position independent and we never rely on the VDSO ending
      up at 1M (and we never place it there on 64bit tasks).
      
      Changing our link address to 0x0 fixes perf VDSO symbol resolution:
      
      73.18%  [vdso]  [.] 0x000000000000060c
      12.39%  [vdso]  [.] __kernel_gettimeofday
       3.58%  test    [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18
       2.94%  [vdso]  [.] __kernel_datapage_offset
       2.90%  test    [.] main
      
      We still have some local symbol resolution issues that will be
      fixed in a subsequent patch.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a0a4419e
    • A
      mm: Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit · 56eecdb9
      Aneesh Kumar K.V 提交于
      Archs like ppc64 doesn't do tlb flush in set_pte/pmd functions when using
      a hash table MMU for various reasons (the flush is handled as part of
      the PTE modification when necessary).
      
      ppc64 thus doesn't implement flush_tlb_range for hash based MMUs.
      
      Additionally ppc64 require the tlb flushing to be batched within ptl locks.
      
      The reason to do that is to ensure that the hash page table is in sync with
      linux page table.
      
      We track the hpte index in linux pte and if we clear them without flushing
      hash and drop the ptl lock, we can have another cpu update the pte and can
      end up with duplicate entry in the hash table, which is fatal.
      
      We also want to keep set_pte_at simpler by not requiring them to do hash
      flush for performance reason. We do that by assuming that set_pte_at() is
      never *ever* called on a PTE that is already valid.
      
      This was the case until the NUMA code went in which broke that assumption.
      
      Fix that by introducing a new pair of helpers to set _PAGE_NUMA in a
      way similar to ptep/pmdp_set_wrprotect(), with a generic implementation
      using set_pte_at() and a powerpc specific one using the appropriate
      mechanism needed to keep the hash table in sync.
      Acked-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      56eecdb9
    • A
      powerpc/mm: Add new "set" flag argument to pte/pmd update function · 88247e8d
      Aneesh Kumar K.V 提交于
      pte_update() is a powerpc-ism used to change the bits of a PTE
      when the access permission is being restricted (a flush is
      potentially needed).
      
      It uses atomic operations on when needed and handles the hash
      synchronization on hash based processors.
      
      It is currently only used to clear PTE bits and so the current
      implementation doesn't provide a way to also set PTE bits.
      
      The new _PAGE_NUMA bit, when set, is actually restricting access
      so it must use that function too, so this change adds the ability
      for pte_update() to also set bits.
      
      We will use this later to set the _PAGE_NUMA bit.
      Acked-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      88247e8d
    • K
      powerpc/pseries: Add Gen3 definitions for PCIE link speed · 49d9684a
      Kleber Sacilotto de Souza 提交于
      Rev3 of the PCI Express Base Specification defines a Supported Link
      Speeds Vector where the bit definitions within this field are:
      
      Bit 0 - 2.5 GT/s
      Bit 1 - 5.0 GT/s
      Bit 2 - 8.0 GT/s
      
      This vector definition is used by the platform firmware to export the
      maximum and current link speeds of the PCI bus via the
      "ibm,pcie-link-speed-stats" device-tree property.
      
      This patch updates pseries_root_bridge_prepare() to detect Gen3
      speed buses (defined by 0x04).
      Signed-off-by: NKleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      49d9684a
    • K
      powerpc/pseries: Fix regression on PCI link speed · b020cc6c
      Kleber Sacilotto de Souza 提交于
      Commit 5091f0c9 (powerpc/pseries: Fix PCIE link speed endian issue)
      introduced a regression on the PCI link speed detection using the
      device-tree property. The ibm,pcie-link-speed-stats property is composed
      of two 32-bit integers, the first one being the maxinum link speed and
      the second the current link speed. The changes introduced by the
      aforementioned commit are considering just the first integer.
      
      Fix this issue by changing how the property is accessed, using the
      helper functions to properly access the array of values. The explicit
      byte swapping is not needed anymore here, since it's done by the helper
      functions.
      Signed-off-by: NKleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b020cc6c
    • K
      powerpc: Set the correct ksp_limit on ppc32 when switching to irq stack · 1a18a664
      Kevin Hao 提交于
      Guenter Roeck has got the following call trace on a p2020 board:
        Kernel stack overflow in process eb3e5a00, r1=eb79df90
        CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4
        task: eb3e5a00 ti: c0616000 task.ti: ef440000
        NIP: c003a420 LR: c003a410 CTR: c0017518
        REGS: eb79dee0 TRAP: 0901   Not tainted (3.13.0-rc8-juniper-00146-g19eca00)
        MSR: 00029000 <CE,EE,ME>  CR: 24008444  XER: 00000000
        GPR00: c003a410 eb79df90 eb3e5a00 00000000 eb05d900 00000001 65d87646 00000000
        GPR08: 00000000 020b8000 00000000 00000000 44008442
        NIP [c003a420] __do_softirq+0x94/0x1ec
        LR [c003a410] __do_softirq+0x84/0x1ec
        Call Trace:
        [eb79df90] [c003a410] __do_softirq+0x84/0x1ec (unreliable)
        [eb79dfe0] [c003a970] irq_exit+0xbc/0xc8
        [eb79dff0] [c000cc1c] call_do_irq+0x24/0x3c
        [ef441f20] [c00046a8] do_IRQ+0x8c/0xf8
        [ef441f40] [c000e7f4] ret_from_except+0x0/0x18
        --- Exception: 501 at 0xfcda524
            LR = 0x10024900
        Instruction dump:
        7c781b78 3b40000a 3a73b040 543c0024 3a800000 3b3913a0 7ef5bb78 48201bf9
        5463103a 7d3b182e 7e89b92e 7c008146 <3ba00000> 7e7e9b78 48000014 57fff87f
        Kernel panic - not syncing: kernel stack overflow
        CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4
        Call Trace:
      
      The reason is that we have used the wrong register to calculate the
      ksp_limit in commit cbc9565e (powerpc: Remove ksp_limit on ppc64).
      Just fix it.
      
      As suggested by Benjamin Herrenschmidt, also add the C prototype of the
      function in the comment in order to avoid such kind of errors in the
      future.
      
      Cc: stable@vger.kernel.org # 3.12
      Reported-by: NGuenter Roeck <linux@roeck-us.net>
      Tested-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NKevin Hao <haokexin@gmail.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1a18a664
  2. 11 2月, 2014 14 次提交
  3. 29 1月, 2014 12 次提交
  4. 28 1月, 2014 2 次提交
  5. 27 1月, 2014 3 次提交
    • P
      KVM: PPC: Book3S PR: Cope with doorbell interrupts · 40688909
      Paul Mackerras 提交于
      When the PR host is running on a POWER8 machine in POWER8 mode, it
      will use doorbell interrupts for IPIs.  If one of them arrives while
      we are in the guest, we pop out of the guest with trap number 0xA00,
      which isn't handled by kvmppc_handle_exit_pr, leading to the following
      BUG_ON:
      
      [  331.436215] exit_nr=0xa00 | pc=0x1d2c | msr=0x800000000000d032
      [  331.437522] ------------[ cut here ]------------
      [  331.438296] kernel BUG at arch/powerpc/kvm/book3s_pr.c:982!
      [  331.439063] Oops: Exception in kernel mode, sig: 5 [#2]
      [  331.439819] SMP NR_CPUS=1024 NUMA pSeries
      [  331.440552] Modules linked in: tun nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw virtio_net kvm binfmt_misc ibmvscsi scsi_transport_srp scsi_tgt virtio_blk
      [  331.447614] CPU: 11 PID: 1296 Comm: qemu-system-ppc Tainted: G      D      3.11.7-200.2.fc19.ppc64p7 #1
      [  331.448920] task: c0000003bdc8c000 ti: c0000003bd32c000 task.ti: c0000003bd32c000
      [  331.450088] NIP: d0000000025d6b9c LR: d0000000025d6b98 CTR: c0000000004cfdd0
      [  331.451042] REGS: c0000003bd32f420 TRAP: 0700   Tainted: G      D       (3.11.7-200.2.fc19.ppc64p7)
      [  331.452331] MSR: 800000000282b032 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI>  CR: 28004824  XER: 20000000
      [  331.454616] SOFTE: 1
      [  331.455106] CFAR: c000000000848bb8
      [  331.455726]
      GPR00: d0000000025d6b98 c0000003bd32f6a0 d0000000026017b8 0000000000000032
      GPR04: c0000000018627f8 c000000001873208 320d0a3030303030 3030303030643033
      GPR08: c000000000c490a8 0000000000000000 0000000000000000 0000000000000002
      GPR12: 0000000028004822 c00000000fdc6300 0000000000000000 00000100076ec310
      GPR16: 000000002ae343b8 00003ffffd397398 0000000000000000 0000000000000000
      GPR20: 00000100076f16f4 00000100076ebe60 0000000000000008 ffffffffffffffff
      GPR24: 0000000000000000 0000008001041e60 0000000000000000 0000008001040ce8
      GPR28: c0000003a2d80000 0000000000000a00 0000000000000001 c0000003a2681810
      [  331.466504] NIP [d0000000025d6b9c] .kvmppc_handle_exit_pr+0x75c/0xa80 [kvm]
      [  331.466999] LR [d0000000025d6b98] .kvmppc_handle_exit_pr+0x758/0xa80 [kvm]
      [  331.467517] Call Trace:
      [  331.467909] [c0000003bd32f6a0] [d0000000025d6b98] .kvmppc_handle_exit_pr+0x758/0xa80 [kvm] (unreliable)
      [  331.468553] [c0000003bd32f750] [d0000000025d98f0] kvm_start_lightweight+0xb4/0xc4 [kvm]
      [  331.469189] [c0000003bd32f920] [d0000000025d7648] .kvmppc_vcpu_run_pr+0xd8/0x270 [kvm]
      [  331.469838] [c0000003bd32f9c0] [d0000000025cf748] .kvmppc_vcpu_run+0xc8/0xf0 [kvm]
      [  331.470790] [c0000003bd32fa50] [d0000000025cc19c] .kvm_arch_vcpu_ioctl_run+0x5c/0x1b0 [kvm]
      [  331.471401] [c0000003bd32fae0] [d0000000025c4888] .kvm_vcpu_ioctl+0x478/0x730 [kvm]
      [  331.472026] [c0000003bd32fc90] [c00000000026192c] .do_vfs_ioctl+0x4dc/0x7a0
      [  331.472561] [c0000003bd32fd80] [c000000000261cc4] .SyS_ioctl+0xd4/0xf0
      [  331.473095] [c0000003bd32fe30] [c000000000009ed8] syscall_exit+0x0/0x98
      [  331.473633] Instruction dump:
      [  331.473766] 4bfff9b4 2b9d0800 419efc18 60000000 60420000 3d220000 e8bf11a0 e8df12a8
      [  331.474733] 7fa4eb78 e8698660 48015165 e8410028 <0fe00000> 813f00e4 3ba00000 39290001
      [  331.475386] ---[ end trace 49fc47d994c1f8f2 ]---
      [  331.479817]
      
      This fixes the problem by making kvmppc_handle_exit_pr() recognize the
      interrupt.  We also need to jump to the doorbell interrupt handler in
      book3s_segment.S to handle the interrupt on the way out of the guest.
      Having done that, there's nothing further to be done in
      kvmppc_handle_exit_pr().
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      40688909
    • M
      KVM: PPC: Book3S HV: Add software abort codes for transactional memory · b17dfec0
      Michael Neuling 提交于
      This adds the software abort code defines for transactional memory (TM).
      These values are from PAPR.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b17dfec0
    • M
      KVM: PPC: Book3S HV: Add new state for transactional memory · 7b490411
      Michael Neuling 提交于
      Add new state for transactional memory (TM) to kvm_vcpu_arch.  Also add
      asm-offset bits that are going to be required.
      
      This also moves the existing TFHAR, TFIAR and TEXASR SPRs into a
      CONFIG_PPC_TRANSACTIONAL_MEM section.  This requires some code changes to
      ensure we still compile with CONFIG_PPC_TRANSACTIONAL_MEM=N.  Much of the added
      the added #ifdefs are removed in a later patch when the bulk of the TM code is
      added.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: fix merge conflict]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      7b490411