1. 07 3月, 2014 4 次提交
  2. 28 2月, 2014 8 次提交
  3. 17 2月, 2014 10 次提交
    • G
      powerpc/eeh: Disable EEH on reboot · 66f9af83
      Gavin Shan 提交于
      We possiblly detect EEH errors during reboot, particularly in kexec
      path, but it's impossible for device drivers and EEH core to handle
      or recover them properly.
      
      The patch registers one reboot notifier for EEH and disable EEH
      subsystem during reboot. That means the EEH errors is going to be
      cleared by hardware reset or second kernel during early stage of
      PCI probe.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      66f9af83
    • G
      powerpc/eeh: Cleanup on eeh_subsystem_enabled · 2ec5a0ad
      Gavin Shan 提交于
      The patch cleans up variable eeh_subsystem_enabled so that we needn't
      refer the variable directly from external. Instead, we will use
      function eeh_enabled() and eeh_set_enable() to operate the variable.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      2ec5a0ad
    • G
      powerpc/powernv: Rework EEH reset · 5b2e198e
      Gavin Shan 提交于
      When doing reset in order to recover the affected PE, we issue
      hot reset on PE primary bus if it's not root bus. Otherwise, we
      issue hot or fundamental reset on root port or PHB accordingly.
      For the later case, we didn't cover the situation where PE only
      includes root port and it potentially causes kernel crash upon
      EEH error to the PE.
      
      The patch reworks the logic of EEH reset to improve the code
      readability and also avoid the kernel crash.
      
      Cc: stable@vger.kernel.org
      Reported-by: NThadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      5b2e198e
    • A
      powerpc: Use unstripped VDSO image for more accurate profiling data · 24b659a1
      Anton Blanchard 提交于
      We are seeing a lot of hits in the VDSO that are not resolved by perf.
      A while(1) gettimeofday() loop shows the issue:
      
      27.64%  [vdso]  [.] 0x000000000000060c
      22.57%  [vdso]  [.] 0x0000000000000628
      16.88%  [vdso]  [.] 0x0000000000000610
      12.39%  [vdso]  [.] __kernel_gettimeofday
       6.09%  [vdso]  [.] 0x00000000000005f8
       3.58%  test    [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18
       2.94%  [vdso]  [.] __kernel_datapage_offset
       2.90%  test    [.] main
      
      We are using a stripped VDSO image which means only symbols with
      relocation info can be resolved. There isn't a lot of point to
      stripping the VDSO, the debug info is only about 1kB:
      
      4680 arch/powerpc/kernel/vdso64/vdso64.so
      5815 arch/powerpc/kernel/vdso64/vdso64.so.dbg
      
      By using the unstripped image, we can resolve all the symbols in the
      VDSO and the perf profile data looks much better:
      
      76.53%  [vdso]  [.] __do_get_tspec
      12.20%  [vdso]  [.] __kernel_gettimeofday
       5.05%  [vdso]  [.] __get_datapage
       3.20%  test    [.] main
       2.92%  test    [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      24b659a1
    • A
      powerpc: Link VDSOs at 0x0 · a0a4419e
      Anton Blanchard 提交于
      perf is failing to resolve symbols in the VDSO. A while (1)
      gettimeofday() loop shows:
      
      93.99%  [vdso]  [.] 0x00000000000005e0
       3.12%  test    [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18
       2.81%  test    [.] main
      
      The reason for this is that we are linking our VDSO shared libraries
      at 1MB, which is a little weird. Even though this is uncommon, Alan
      points out that it is valid and we should probably fix perf userspace.
      
      Regardless, I can't see a reason why we are doing this. The code
      is all position independent and we never rely on the VDSO ending
      up at 1M (and we never place it there on 64bit tasks).
      
      Changing our link address to 0x0 fixes perf VDSO symbol resolution:
      
      73.18%  [vdso]  [.] 0x000000000000060c
      12.39%  [vdso]  [.] __kernel_gettimeofday
       3.58%  test    [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18
       2.94%  [vdso]  [.] __kernel_datapage_offset
       2.90%  test    [.] main
      
      We still have some local symbol resolution issues that will be
      fixed in a subsequent patch.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a0a4419e
    • A
      mm: Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit · 56eecdb9
      Aneesh Kumar K.V 提交于
      Archs like ppc64 doesn't do tlb flush in set_pte/pmd functions when using
      a hash table MMU for various reasons (the flush is handled as part of
      the PTE modification when necessary).
      
      ppc64 thus doesn't implement flush_tlb_range for hash based MMUs.
      
      Additionally ppc64 require the tlb flushing to be batched within ptl locks.
      
      The reason to do that is to ensure that the hash page table is in sync with
      linux page table.
      
      We track the hpte index in linux pte and if we clear them without flushing
      hash and drop the ptl lock, we can have another cpu update the pte and can
      end up with duplicate entry in the hash table, which is fatal.
      
      We also want to keep set_pte_at simpler by not requiring them to do hash
      flush for performance reason. We do that by assuming that set_pte_at() is
      never *ever* called on a PTE that is already valid.
      
      This was the case until the NUMA code went in which broke that assumption.
      
      Fix that by introducing a new pair of helpers to set _PAGE_NUMA in a
      way similar to ptep/pmdp_set_wrprotect(), with a generic implementation
      using set_pte_at() and a powerpc specific one using the appropriate
      mechanism needed to keep the hash table in sync.
      Acked-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      56eecdb9
    • A
      powerpc/mm: Add new "set" flag argument to pte/pmd update function · 88247e8d
      Aneesh Kumar K.V 提交于
      pte_update() is a powerpc-ism used to change the bits of a PTE
      when the access permission is being restricted (a flush is
      potentially needed).
      
      It uses atomic operations on when needed and handles the hash
      synchronization on hash based processors.
      
      It is currently only used to clear PTE bits and so the current
      implementation doesn't provide a way to also set PTE bits.
      
      The new _PAGE_NUMA bit, when set, is actually restricting access
      so it must use that function too, so this change adds the ability
      for pte_update() to also set bits.
      
      We will use this later to set the _PAGE_NUMA bit.
      Acked-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      88247e8d
    • K
      powerpc/pseries: Add Gen3 definitions for PCIE link speed · 49d9684a
      Kleber Sacilotto de Souza 提交于
      Rev3 of the PCI Express Base Specification defines a Supported Link
      Speeds Vector where the bit definitions within this field are:
      
      Bit 0 - 2.5 GT/s
      Bit 1 - 5.0 GT/s
      Bit 2 - 8.0 GT/s
      
      This vector definition is used by the platform firmware to export the
      maximum and current link speeds of the PCI bus via the
      "ibm,pcie-link-speed-stats" device-tree property.
      
      This patch updates pseries_root_bridge_prepare() to detect Gen3
      speed buses (defined by 0x04).
      Signed-off-by: NKleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      49d9684a
    • K
      powerpc/pseries: Fix regression on PCI link speed · b020cc6c
      Kleber Sacilotto de Souza 提交于
      Commit 5091f0c9 (powerpc/pseries: Fix PCIE link speed endian issue)
      introduced a regression on the PCI link speed detection using the
      device-tree property. The ibm,pcie-link-speed-stats property is composed
      of two 32-bit integers, the first one being the maxinum link speed and
      the second the current link speed. The changes introduced by the
      aforementioned commit are considering just the first integer.
      
      Fix this issue by changing how the property is accessed, using the
      helper functions to properly access the array of values. The explicit
      byte swapping is not needed anymore here, since it's done by the helper
      functions.
      Signed-off-by: NKleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b020cc6c
    • K
      powerpc: Set the correct ksp_limit on ppc32 when switching to irq stack · 1a18a664
      Kevin Hao 提交于
      Guenter Roeck has got the following call trace on a p2020 board:
        Kernel stack overflow in process eb3e5a00, r1=eb79df90
        CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4
        task: eb3e5a00 ti: c0616000 task.ti: ef440000
        NIP: c003a420 LR: c003a410 CTR: c0017518
        REGS: eb79dee0 TRAP: 0901   Not tainted (3.13.0-rc8-juniper-00146-g19eca00)
        MSR: 00029000 <CE,EE,ME>  CR: 24008444  XER: 00000000
        GPR00: c003a410 eb79df90 eb3e5a00 00000000 eb05d900 00000001 65d87646 00000000
        GPR08: 00000000 020b8000 00000000 00000000 44008442
        NIP [c003a420] __do_softirq+0x94/0x1ec
        LR [c003a410] __do_softirq+0x84/0x1ec
        Call Trace:
        [eb79df90] [c003a410] __do_softirq+0x84/0x1ec (unreliable)
        [eb79dfe0] [c003a970] irq_exit+0xbc/0xc8
        [eb79dff0] [c000cc1c] call_do_irq+0x24/0x3c
        [ef441f20] [c00046a8] do_IRQ+0x8c/0xf8
        [ef441f40] [c000e7f4] ret_from_except+0x0/0x18
        --- Exception: 501 at 0xfcda524
            LR = 0x10024900
        Instruction dump:
        7c781b78 3b40000a 3a73b040 543c0024 3a800000 3b3913a0 7ef5bb78 48201bf9
        5463103a 7d3b182e 7e89b92e 7c008146 <3ba00000> 7e7e9b78 48000014 57fff87f
        Kernel panic - not syncing: kernel stack overflow
        CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4
        Call Trace:
      
      The reason is that we have used the wrong register to calculate the
      ksp_limit in commit cbc9565e (powerpc: Remove ksp_limit on ppc64).
      Just fix it.
      
      As suggested by Benjamin Herrenschmidt, also add the C prototype of the
      function in the comment in order to avoid such kind of errors in the
      future.
      
      Cc: stable@vger.kernel.org # 3.12
      Reported-by: NGuenter Roeck <linux@roeck-us.net>
      Tested-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NKevin Hao <haokexin@gmail.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1a18a664
  4. 11 2月, 2014 14 次提交
  5. 08 2月, 2014 3 次提交
    • M
      arm64: defconfig: Expand default enabled features · 55834a77
      Mark Rutland 提交于
      FPGA implementations of the Cortex-A57 and Cortex-A53 are now available
      in the form of the SMM-A57 and SMM-A53 Soft Macrocell Models (SMMs) for
      Versatile Express. As these attach to a Motherboard Express V2M-P1 it
      would be useful to have support for some V2M-P1 peripherals enabled by
      default.
      
      Additionally a couple of of features have been introduced since the last
      defconfig update (CMA, jump labels) that would be good to have enabled
      by default to ensure they are build and boot tested.
      
      This patch updates the arm64 defconfig to enable support for these
      devices and features. The arm64 Kconfig is modified to select
      HAVE_PATA_PLATFORM, which is required to enable support for the
      CompactFlash controller on the V2M-P1.
      
      A few options which don't need to appear in defconfig are trimmed:
      
      * BLK_DEV - selected by default
      * EXPERIMENTAL - otherwise gone from the kernel
      * MII - selected by drivers which require it
      * USB_SUPPORT - selected by default
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      55834a77
    • W
      arm64: asm: remove redundant "cc" clobbers · 95c41896
      Will Deacon 提交于
      cbnz/tbnz don't update the condition flags, so remove the "cc" clobbers
      from inline asm blocks that only use these instructions to implement
      conditional branches.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      95c41896
    • W
      arm64: atomics: fix use of acquire + release for full barrier semantics · 8e86f0b4
      Will Deacon 提交于
      Linux requires a number of atomic operations to provide full barrier
      semantics, that is no memory accesses after the operation can be
      observed before any accesses up to and including the operation in
      program order.
      
      On arm64, these operations have been incorrectly implemented as follows:
      
      	// A, B, C are independent memory locations
      
      	<Access [A]>
      
      	// atomic_op (B)
      1:	ldaxr	x0, [B]		// Exclusive load with acquire
      	<op(B)>
      	stlxr	w1, x0, [B]	// Exclusive store with release
      	cbnz	w1, 1b
      
      	<Access [C]>
      
      The assumption here being that two half barriers are equivalent to a
      full barrier, so the only permitted ordering would be A -> B -> C
      (where B is the atomic operation involving both a load and a store).
      
      Unfortunately, this is not the case by the letter of the architecture
      and, in fact, the accesses to A and C are permitted to pass their
      nearest half barrier resulting in orderings such as Bl -> A -> C -> Bs
      or Bl -> C -> A -> Bs (where Bl is the load-acquire on B and Bs is the
      store-release on B). This is a clear violation of the full barrier
      requirement.
      
      The simple way to fix this is to implement the same algorithm as ARMv7
      using explicit barriers:
      
      	<Access [A]>
      
      	// atomic_op (B)
      	dmb	ish		// Full barrier
      1:	ldxr	x0, [B]		// Exclusive load
      	<op(B)>
      	stxr	w1, x0, [B]	// Exclusive store
      	cbnz	w1, 1b
      	dmb	ish		// Full barrier
      
      	<Access [C]>
      
      but this has the undesirable effect of introducing *two* full barrier
      instructions. A better approach is actually the following, non-intuitive
      sequence:
      
      	<Access [A]>
      
      	// atomic_op (B)
      1:	ldxr	x0, [B]		// Exclusive load
      	<op(B)>
      	stlxr	w1, x0, [B]	// Exclusive store with release
      	cbnz	w1, 1b
      	dmb	ish		// Full barrier
      
      	<Access [C]>
      
      The simple observations here are:
      
        - The dmb ensures that no subsequent accesses (e.g. the access to C)
          can enter or pass the atomic sequence.
      
        - The dmb also ensures that no prior accesses (e.g. the access to A)
          can pass the atomic sequence.
      
        - Therefore, no prior access can pass a subsequent access, or
          vice-versa (i.e. A is strictly ordered before C).
      
        - The stlxr ensures that no prior access can pass the store component
          of the atomic operation.
      
      The only tricky part remaining is the ordering between the ldxr and the
      access to A, since the absence of the first dmb means that we're now
      permitting re-ordering between the ldxr and any prior accesses.
      
      From an (arbitrary) observer's point of view, there are two scenarios:
      
        1. We have observed the ldxr. This means that if we perform a store to
           [B], the ldxr will still return older data. If we can observe the
           ldxr, then we can potentially observe the permitted re-ordering
           with the access to A, which is clearly an issue when compared to
           the dmb variant of the code. Thankfully, the exclusive monitor will
           save us here since it will be cleared as a result of the store and
           the ldxr will retry. Notice that any use of a later memory
           observation to imply observation of the ldxr will also imply
           observation of the access to A, since the stlxr/dmb ensure strict
           ordering.
      
        2. We have not observed the ldxr. This means we can perform a store
           and influence the later ldxr. However, that doesn't actually tell
           us anything about the access to [A], so we've not lost anything
           here either when compared to the dmb variant.
      
      This patch implements this solution for our barriered atomic operations,
      ensuring that we satisfy the full barrier requirements where they are
      needed.
      
      Cc: <stable@vger.kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      8e86f0b4
  6. 07 2月, 2014 1 次提交
    • T
      arch/x86/mm/numa.c: fix array index overflow when synchronizing nid to memblock.reserved. · 7bc35fdd
      Tang Chen 提交于
      The following path will cause array out of bound.
      
      memblock_add_region() will always set nid in memblock.reserved to
      MAX_NUMNODES.  In numa_register_memblks(), after we set all nid to
      correct valus in memblock.reserved, we called setup_node_data(), and
      used memblock_alloc_nid() to allocate memory, with nid set to
      MAX_NUMNODES.
      
      The nodemask_t type can be seen as a bit array.  And the index is 0 ~
      MAX_NUMNODES-1.
      
      After that, when we call node_set() in numa_clear_kernel_node_hotplug(),
      the nodemask_t got an index of value MAX_NUMNODES, which is out of [0 ~
      MAX_NUMNODES-1].
      
      See below:
      
      numa_init()
       |---> numa_register_memblks()
       |      |---> memblock_set_node(memory)		set correct nid in memblock.memory
       |      |---> memblock_set_node(reserved)	set correct nid in memblock.reserved
       |      |......
       |      |---> setup_node_data()
       |             |---> memblock_alloc_nid()	here, nid is set to MAX_NUMNODES (1024)
       |......
       |---> numa_clear_kernel_node_hotplug()
              |---> node_set()			here, we have an index 1024, and overflowed
      
      This patch moves nid setting to numa_clear_kernel_node_hotplug() to fix
      this problem.
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Tested-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
      Reported-by: NDave Jones <davej@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Tested-by: NDave Jones <davej@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7bc35fdd
新手
引导
客服 返回
顶部