1. 07 3月, 2014 1 次提交
    • M
      powerpc/tm: Fix crash when forking inside a transaction · 621b5060
      Michael Neuling 提交于
      When we fork/clone we currently don't copy any of the TM state to the new
      thread.  This results in a TM bad thing (program check) when the new process is
      switched in as the kernel does a tmrechkpt with TEXASR FS not set.  Also, since
      R1 is from userspace, we trigger the bad kernel stack pointer detection.  So we
      end up with something like this:
      
         Bad kernel stack pointer 0 at c0000000000404fc
         cpu 0x2: Vector: 700 (Program Check) at [c00000003ffefd40]
             pc: c0000000000404fc: restore_gprs+0xc0/0x148
             lr: 0000000000000000
             sp: 0
            msr: 9000000100201030
           current = 0xc000001dd1417c30
           paca    = 0xc00000000fe00800   softe: 0        irq_happened: 0x01
             pid   = 0, comm = swapper/2
         WARNING: exception is not recoverable, can't continue
      
      The below fixes this by flushing the TM state before we copy the task_struct to
      the clone.  To do this we go through the tmreclaim patch, which removes the
      checkpointed registers from the CPU and transitions the CPU out of TM suspend
      mode.  Hence we need to call tmrechkpt after to restore the checkpointed state
      and the TM mode for the current task.
      
      To make this fail from userspace is simply:
      	tbegin
      	li	r0, 2
      	sc
      	<boom>
      
      Kudos to Adhemerval Zanella Neto for finding this.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      cc: Adhemerval Zanella Neto <azanella@br.ibm.com>
      cc: stable@vger.kernel.org
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      621b5060
  2. 28 2月, 2014 8 次提交
  3. 17 2月, 2014 11 次提交
    • G
      powerpc/eeh: Disable EEH on reboot · 66f9af83
      Gavin Shan 提交于
      We possiblly detect EEH errors during reboot, particularly in kexec
      path, but it's impossible for device drivers and EEH core to handle
      or recover them properly.
      
      The patch registers one reboot notifier for EEH and disable EEH
      subsystem during reboot. That means the EEH errors is going to be
      cleared by hardware reset or second kernel during early stage of
      PCI probe.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      66f9af83
    • G
      powerpc/eeh: Cleanup on eeh_subsystem_enabled · 2ec5a0ad
      Gavin Shan 提交于
      The patch cleans up variable eeh_subsystem_enabled so that we needn't
      refer the variable directly from external. Instead, we will use
      function eeh_enabled() and eeh_set_enable() to operate the variable.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      2ec5a0ad
    • G
      powerpc/powernv: Rework EEH reset · 5b2e198e
      Gavin Shan 提交于
      When doing reset in order to recover the affected PE, we issue
      hot reset on PE primary bus if it's not root bus. Otherwise, we
      issue hot or fundamental reset on root port or PHB accordingly.
      For the later case, we didn't cover the situation where PE only
      includes root port and it potentially causes kernel crash upon
      EEH error to the PE.
      
      The patch reworks the logic of EEH reset to improve the code
      readability and also avoid the kernel crash.
      
      Cc: stable@vger.kernel.org
      Reported-by: NThadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      5b2e198e
    • A
      powerpc: Use unstripped VDSO image for more accurate profiling data · 24b659a1
      Anton Blanchard 提交于
      We are seeing a lot of hits in the VDSO that are not resolved by perf.
      A while(1) gettimeofday() loop shows the issue:
      
      27.64%  [vdso]  [.] 0x000000000000060c
      22.57%  [vdso]  [.] 0x0000000000000628
      16.88%  [vdso]  [.] 0x0000000000000610
      12.39%  [vdso]  [.] __kernel_gettimeofday
       6.09%  [vdso]  [.] 0x00000000000005f8
       3.58%  test    [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18
       2.94%  [vdso]  [.] __kernel_datapage_offset
       2.90%  test    [.] main
      
      We are using a stripped VDSO image which means only symbols with
      relocation info can be resolved. There isn't a lot of point to
      stripping the VDSO, the debug info is only about 1kB:
      
      4680 arch/powerpc/kernel/vdso64/vdso64.so
      5815 arch/powerpc/kernel/vdso64/vdso64.so.dbg
      
      By using the unstripped image, we can resolve all the symbols in the
      VDSO and the perf profile data looks much better:
      
      76.53%  [vdso]  [.] __do_get_tspec
      12.20%  [vdso]  [.] __kernel_gettimeofday
       5.05%  [vdso]  [.] __get_datapage
       3.20%  test    [.] main
       2.92%  test    [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      24b659a1
    • A
      powerpc: Link VDSOs at 0x0 · a0a4419e
      Anton Blanchard 提交于
      perf is failing to resolve symbols in the VDSO. A while (1)
      gettimeofday() loop shows:
      
      93.99%  [vdso]  [.] 0x00000000000005e0
       3.12%  test    [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18
       2.81%  test    [.] main
      
      The reason for this is that we are linking our VDSO shared libraries
      at 1MB, which is a little weird. Even though this is uncommon, Alan
      points out that it is valid and we should probably fix perf userspace.
      
      Regardless, I can't see a reason why we are doing this. The code
      is all position independent and we never rely on the VDSO ending
      up at 1M (and we never place it there on 64bit tasks).
      
      Changing our link address to 0x0 fixes perf VDSO symbol resolution:
      
      73.18%  [vdso]  [.] 0x000000000000060c
      12.39%  [vdso]  [.] __kernel_gettimeofday
       3.58%  test    [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18
       2.94%  [vdso]  [.] __kernel_datapage_offset
       2.90%  test    [.] main
      
      We still have some local symbol resolution issues that will be
      fixed in a subsequent patch.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a0a4419e
    • A
      mm: Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit · 56eecdb9
      Aneesh Kumar K.V 提交于
      Archs like ppc64 doesn't do tlb flush in set_pte/pmd functions when using
      a hash table MMU for various reasons (the flush is handled as part of
      the PTE modification when necessary).
      
      ppc64 thus doesn't implement flush_tlb_range for hash based MMUs.
      
      Additionally ppc64 require the tlb flushing to be batched within ptl locks.
      
      The reason to do that is to ensure that the hash page table is in sync with
      linux page table.
      
      We track the hpte index in linux pte and if we clear them without flushing
      hash and drop the ptl lock, we can have another cpu update the pte and can
      end up with duplicate entry in the hash table, which is fatal.
      
      We also want to keep set_pte_at simpler by not requiring them to do hash
      flush for performance reason. We do that by assuming that set_pte_at() is
      never *ever* called on a PTE that is already valid.
      
      This was the case until the NUMA code went in which broke that assumption.
      
      Fix that by introducing a new pair of helpers to set _PAGE_NUMA in a
      way similar to ptep/pmdp_set_wrprotect(), with a generic implementation
      using set_pte_at() and a powerpc specific one using the appropriate
      mechanism needed to keep the hash table in sync.
      Acked-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      56eecdb9
    • A
      9d85d586
    • A
      powerpc/mm: Add new "set" flag argument to pte/pmd update function · 88247e8d
      Aneesh Kumar K.V 提交于
      pte_update() is a powerpc-ism used to change the bits of a PTE
      when the access permission is being restricted (a flush is
      potentially needed).
      
      It uses atomic operations on when needed and handles the hash
      synchronization on hash based processors.
      
      It is currently only used to clear PTE bits and so the current
      implementation doesn't provide a way to also set PTE bits.
      
      The new _PAGE_NUMA bit, when set, is actually restricting access
      so it must use that function too, so this change adds the ability
      for pte_update() to also set bits.
      
      We will use this later to set the _PAGE_NUMA bit.
      Acked-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      88247e8d
    • K
      powerpc/pseries: Add Gen3 definitions for PCIE link speed · 49d9684a
      Kleber Sacilotto de Souza 提交于
      Rev3 of the PCI Express Base Specification defines a Supported Link
      Speeds Vector where the bit definitions within this field are:
      
      Bit 0 - 2.5 GT/s
      Bit 1 - 5.0 GT/s
      Bit 2 - 8.0 GT/s
      
      This vector definition is used by the platform firmware to export the
      maximum and current link speeds of the PCI bus via the
      "ibm,pcie-link-speed-stats" device-tree property.
      
      This patch updates pseries_root_bridge_prepare() to detect Gen3
      speed buses (defined by 0x04).
      Signed-off-by: NKleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      49d9684a
    • K
      powerpc/pseries: Fix regression on PCI link speed · b020cc6c
      Kleber Sacilotto de Souza 提交于
      Commit 5091f0c9 (powerpc/pseries: Fix PCIE link speed endian issue)
      introduced a regression on the PCI link speed detection using the
      device-tree property. The ibm,pcie-link-speed-stats property is composed
      of two 32-bit integers, the first one being the maxinum link speed and
      the second the current link speed. The changes introduced by the
      aforementioned commit are considering just the first integer.
      
      Fix this issue by changing how the property is accessed, using the
      helper functions to properly access the array of values. The explicit
      byte swapping is not needed anymore here, since it's done by the helper
      functions.
      Signed-off-by: NKleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b020cc6c
    • K
      powerpc: Set the correct ksp_limit on ppc32 when switching to irq stack · 1a18a664
      Kevin Hao 提交于
      Guenter Roeck has got the following call trace on a p2020 board:
        Kernel stack overflow in process eb3e5a00, r1=eb79df90
        CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4
        task: eb3e5a00 ti: c0616000 task.ti: ef440000
        NIP: c003a420 LR: c003a410 CTR: c0017518
        REGS: eb79dee0 TRAP: 0901   Not tainted (3.13.0-rc8-juniper-00146-g19eca00)
        MSR: 00029000 <CE,EE,ME>  CR: 24008444  XER: 00000000
        GPR00: c003a410 eb79df90 eb3e5a00 00000000 eb05d900 00000001 65d87646 00000000
        GPR08: 00000000 020b8000 00000000 00000000 44008442
        NIP [c003a420] __do_softirq+0x94/0x1ec
        LR [c003a410] __do_softirq+0x84/0x1ec
        Call Trace:
        [eb79df90] [c003a410] __do_softirq+0x84/0x1ec (unreliable)
        [eb79dfe0] [c003a970] irq_exit+0xbc/0xc8
        [eb79dff0] [c000cc1c] call_do_irq+0x24/0x3c
        [ef441f20] [c00046a8] do_IRQ+0x8c/0xf8
        [ef441f40] [c000e7f4] ret_from_except+0x0/0x18
        --- Exception: 501 at 0xfcda524
            LR = 0x10024900
        Instruction dump:
        7c781b78 3b40000a 3a73b040 543c0024 3a800000 3b3913a0 7ef5bb78 48201bf9
        5463103a 7d3b182e 7e89b92e 7c008146 <3ba00000> 7e7e9b78 48000014 57fff87f
        Kernel panic - not syncing: kernel stack overflow
        CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4
        Call Trace:
      
      The reason is that we have used the wrong register to calculate the
      ksp_limit in commit cbc9565e (powerpc: Remove ksp_limit on ppc64).
      Just fix it.
      
      As suggested by Benjamin Herrenschmidt, also add the C prototype of the
      function in the comment in order to avoid such kind of errors in the
      future.
      
      Cc: stable@vger.kernel.org # 3.12
      Reported-by: NGuenter Roeck <linux@roeck-us.net>
      Tested-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NKevin Hao <haokexin@gmail.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1a18a664
  4. 11 2月, 2014 15 次提交
  5. 10 2月, 2014 5 次提交