1. 12 3月, 2014 1 次提交
    • G
      of: Make device nodes kobjects so they show up in sysfs · 75b57ecf
      Grant Likely 提交于
      Device tree nodes are already treated as objects, and we already want to
      expose them to userspace which is done using the /proc filesystem today.
      Right now the kernel has to do a lot of work to keep the /proc view in
      sync with the in-kernel representation. If device_nodes are switched to
      be kobjects then the device tree code can be a whole lot simpler. It
      also turns out that switching to using /sysfs from /proc results in
      smaller code and data size, and the userspace ABI won't change if
      /proc/device-tree symlinks to /sys/firmware/devicetree/base.
      
      v7: Add missing sysfs_bin_attr_init()
      v6: Add __of_add_property() early init fixes from Pantelis
      v5: Rename firmware/ofw to firmware/devicetree
          Fix updating property values in sysfs
      v4: Fixed build error on Powerpc
          Fixed handling of dynamic nodes on powerpc
      v3: Fixed handling of duplicate attribute and child node names
      v2: switch to using sysfs bin_attributes which solve the problem of
          reporting incorrect property size.
      Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
      Tested-by: NSascha Hauer <s.hauer@pengutronix.de>
      Cc: Rob Herring <rob.herring@calxeda.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
      Cc: Pantelis Antoniou <panto@antoniou-consulting.com>
      75b57ecf
  2. 28 2月, 2014 8 次提交
  3. 17 2月, 2014 10 次提交
    • G
      powerpc/eeh: Disable EEH on reboot · 66f9af83
      Gavin Shan 提交于
      We possiblly detect EEH errors during reboot, particularly in kexec
      path, but it's impossible for device drivers and EEH core to handle
      or recover them properly.
      
      The patch registers one reboot notifier for EEH and disable EEH
      subsystem during reboot. That means the EEH errors is going to be
      cleared by hardware reset or second kernel during early stage of
      PCI probe.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      66f9af83
    • G
      powerpc/eeh: Cleanup on eeh_subsystem_enabled · 2ec5a0ad
      Gavin Shan 提交于
      The patch cleans up variable eeh_subsystem_enabled so that we needn't
      refer the variable directly from external. Instead, we will use
      function eeh_enabled() and eeh_set_enable() to operate the variable.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      2ec5a0ad
    • G
      powerpc/powernv: Rework EEH reset · 5b2e198e
      Gavin Shan 提交于
      When doing reset in order to recover the affected PE, we issue
      hot reset on PE primary bus if it's not root bus. Otherwise, we
      issue hot or fundamental reset on root port or PHB accordingly.
      For the later case, we didn't cover the situation where PE only
      includes root port and it potentially causes kernel crash upon
      EEH error to the PE.
      
      The patch reworks the logic of EEH reset to improve the code
      readability and also avoid the kernel crash.
      
      Cc: stable@vger.kernel.org
      Reported-by: NThadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      5b2e198e
    • A
      powerpc: Use unstripped VDSO image for more accurate profiling data · 24b659a1
      Anton Blanchard 提交于
      We are seeing a lot of hits in the VDSO that are not resolved by perf.
      A while(1) gettimeofday() loop shows the issue:
      
      27.64%  [vdso]  [.] 0x000000000000060c
      22.57%  [vdso]  [.] 0x0000000000000628
      16.88%  [vdso]  [.] 0x0000000000000610
      12.39%  [vdso]  [.] __kernel_gettimeofday
       6.09%  [vdso]  [.] 0x00000000000005f8
       3.58%  test    [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18
       2.94%  [vdso]  [.] __kernel_datapage_offset
       2.90%  test    [.] main
      
      We are using a stripped VDSO image which means only symbols with
      relocation info can be resolved. There isn't a lot of point to
      stripping the VDSO, the debug info is only about 1kB:
      
      4680 arch/powerpc/kernel/vdso64/vdso64.so
      5815 arch/powerpc/kernel/vdso64/vdso64.so.dbg
      
      By using the unstripped image, we can resolve all the symbols in the
      VDSO and the perf profile data looks much better:
      
      76.53%  [vdso]  [.] __do_get_tspec
      12.20%  [vdso]  [.] __kernel_gettimeofday
       5.05%  [vdso]  [.] __get_datapage
       3.20%  test    [.] main
       2.92%  test    [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      24b659a1
    • A
      powerpc: Link VDSOs at 0x0 · a0a4419e
      Anton Blanchard 提交于
      perf is failing to resolve symbols in the VDSO. A while (1)
      gettimeofday() loop shows:
      
      93.99%  [vdso]  [.] 0x00000000000005e0
       3.12%  test    [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18
       2.81%  test    [.] main
      
      The reason for this is that we are linking our VDSO shared libraries
      at 1MB, which is a little weird. Even though this is uncommon, Alan
      points out that it is valid and we should probably fix perf userspace.
      
      Regardless, I can't see a reason why we are doing this. The code
      is all position independent and we never rely on the VDSO ending
      up at 1M (and we never place it there on 64bit tasks).
      
      Changing our link address to 0x0 fixes perf VDSO symbol resolution:
      
      73.18%  [vdso]  [.] 0x000000000000060c
      12.39%  [vdso]  [.] __kernel_gettimeofday
       3.58%  test    [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18
       2.94%  [vdso]  [.] __kernel_datapage_offset
       2.90%  test    [.] main
      
      We still have some local symbol resolution issues that will be
      fixed in a subsequent patch.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a0a4419e
    • A
      mm: Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit · 56eecdb9
      Aneesh Kumar K.V 提交于
      Archs like ppc64 doesn't do tlb flush in set_pte/pmd functions when using
      a hash table MMU for various reasons (the flush is handled as part of
      the PTE modification when necessary).
      
      ppc64 thus doesn't implement flush_tlb_range for hash based MMUs.
      
      Additionally ppc64 require the tlb flushing to be batched within ptl locks.
      
      The reason to do that is to ensure that the hash page table is in sync with
      linux page table.
      
      We track the hpte index in linux pte and if we clear them without flushing
      hash and drop the ptl lock, we can have another cpu update the pte and can
      end up with duplicate entry in the hash table, which is fatal.
      
      We also want to keep set_pte_at simpler by not requiring them to do hash
      flush for performance reason. We do that by assuming that set_pte_at() is
      never *ever* called on a PTE that is already valid.
      
      This was the case until the NUMA code went in which broke that assumption.
      
      Fix that by introducing a new pair of helpers to set _PAGE_NUMA in a
      way similar to ptep/pmdp_set_wrprotect(), with a generic implementation
      using set_pte_at() and a powerpc specific one using the appropriate
      mechanism needed to keep the hash table in sync.
      Acked-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      56eecdb9
    • A
      powerpc/mm: Add new "set" flag argument to pte/pmd update function · 88247e8d
      Aneesh Kumar K.V 提交于
      pte_update() is a powerpc-ism used to change the bits of a PTE
      when the access permission is being restricted (a flush is
      potentially needed).
      
      It uses atomic operations on when needed and handles the hash
      synchronization on hash based processors.
      
      It is currently only used to clear PTE bits and so the current
      implementation doesn't provide a way to also set PTE bits.
      
      The new _PAGE_NUMA bit, when set, is actually restricting access
      so it must use that function too, so this change adds the ability
      for pte_update() to also set bits.
      
      We will use this later to set the _PAGE_NUMA bit.
      Acked-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      88247e8d
    • K
      powerpc/pseries: Add Gen3 definitions for PCIE link speed · 49d9684a
      Kleber Sacilotto de Souza 提交于
      Rev3 of the PCI Express Base Specification defines a Supported Link
      Speeds Vector where the bit definitions within this field are:
      
      Bit 0 - 2.5 GT/s
      Bit 1 - 5.0 GT/s
      Bit 2 - 8.0 GT/s
      
      This vector definition is used by the platform firmware to export the
      maximum and current link speeds of the PCI bus via the
      "ibm,pcie-link-speed-stats" device-tree property.
      
      This patch updates pseries_root_bridge_prepare() to detect Gen3
      speed buses (defined by 0x04).
      Signed-off-by: NKleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      49d9684a
    • K
      powerpc/pseries: Fix regression on PCI link speed · b020cc6c
      Kleber Sacilotto de Souza 提交于
      Commit 5091f0c9 (powerpc/pseries: Fix PCIE link speed endian issue)
      introduced a regression on the PCI link speed detection using the
      device-tree property. The ibm,pcie-link-speed-stats property is composed
      of two 32-bit integers, the first one being the maxinum link speed and
      the second the current link speed. The changes introduced by the
      aforementioned commit are considering just the first integer.
      
      Fix this issue by changing how the property is accessed, using the
      helper functions to properly access the array of values. The explicit
      byte swapping is not needed anymore here, since it's done by the helper
      functions.
      Signed-off-by: NKleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b020cc6c
    • K
      powerpc: Set the correct ksp_limit on ppc32 when switching to irq stack · 1a18a664
      Kevin Hao 提交于
      Guenter Roeck has got the following call trace on a p2020 board:
        Kernel stack overflow in process eb3e5a00, r1=eb79df90
        CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4
        task: eb3e5a00 ti: c0616000 task.ti: ef440000
        NIP: c003a420 LR: c003a410 CTR: c0017518
        REGS: eb79dee0 TRAP: 0901   Not tainted (3.13.0-rc8-juniper-00146-g19eca00)
        MSR: 00029000 <CE,EE,ME>  CR: 24008444  XER: 00000000
        GPR00: c003a410 eb79df90 eb3e5a00 00000000 eb05d900 00000001 65d87646 00000000
        GPR08: 00000000 020b8000 00000000 00000000 44008442
        NIP [c003a420] __do_softirq+0x94/0x1ec
        LR [c003a410] __do_softirq+0x84/0x1ec
        Call Trace:
        [eb79df90] [c003a410] __do_softirq+0x84/0x1ec (unreliable)
        [eb79dfe0] [c003a970] irq_exit+0xbc/0xc8
        [eb79dff0] [c000cc1c] call_do_irq+0x24/0x3c
        [ef441f20] [c00046a8] do_IRQ+0x8c/0xf8
        [ef441f40] [c000e7f4] ret_from_except+0x0/0x18
        --- Exception: 501 at 0xfcda524
            LR = 0x10024900
        Instruction dump:
        7c781b78 3b40000a 3a73b040 543c0024 3a800000 3b3913a0 7ef5bb78 48201bf9
        5463103a 7d3b182e 7e89b92e 7c008146 <3ba00000> 7e7e9b78 48000014 57fff87f
        Kernel panic - not syncing: kernel stack overflow
        CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4
        Call Trace:
      
      The reason is that we have used the wrong register to calculate the
      ksp_limit in commit cbc9565e (powerpc: Remove ksp_limit on ppc64).
      Just fix it.
      
      As suggested by Benjamin Herrenschmidt, also add the C prototype of the
      function in the comment in order to avoid such kind of errors in the
      future.
      
      Cc: stable@vger.kernel.org # 3.12
      Reported-by: NGuenter Roeck <linux@roeck-us.net>
      Tested-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NKevin Hao <haokexin@gmail.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1a18a664
  4. 11 2月, 2014 14 次提交
  5. 29 1月, 2014 7 次提交
    • B
    • T
      powerpc/hugetlb: Replace __get_cpu_var with get_cpu_var · 94b09d75
      Tiejun Chen 提交于
      Replace __get_cpu_var safely with get_cpu_var to avoid
      the following call trace:
      
      [ 7253.637591] BUG: using smp_processor_id() in preemptible [00000000 00000000]
      code: hugemmap01/9048
      [ 7253.637601] caller is free_hugepd_range.constprop.25+0x88/0x1a8
      [ 7253.637605] CPU: 1 PID: 9048 Comm: hugemmap01 Not tainted 3.10.20-rt14+ #114
      [ 7253.637606] Call Trace:
      [ 7253.637617] [cb049d80] [c0007ea4] show_stack+0x4c/0x168 (unreliable)
      [ 7253.637624] [cb049dc0] [c031c674] debug_smp_processor_id+0x114/0x134
      [ 7253.637628] [cb049de0] [c0016d28] free_hugepd_range.constprop.25+0x88/0x1a8
      [ 7253.637632] [cb049e00] [c001711c] hugetlb_free_pgd_range+0x6c/0x168
      [ 7253.637639] [cb049e40] [c0117408] free_pgtables+0x12c/0x150
      [ 7253.637646] [cb049e70] [c011ce38] unmap_region+0xa0/0x11c
      [ 7253.637671] [cb049ef0] [c011f03c] do_munmap+0x224/0x3bc
      [ 7253.637676] [cb049f20] [c011f2e0] vm_munmap+0x38/0x5c
      [ 7253.637682] [cb049f40] [c000ef88] ret_from_syscall+0x0/0x3c
      [ 7253.637686] --- Exception: c01 at 0xff16004
      
      Signed-off-by: Tiejun Chen<tiejun.chen@windriver.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      94b09d75
    • P
      powerpc: Make sure "cache" directory is removed when offlining cpu · 91b973f9
      Paul Mackerras 提交于
      The code in remove_cache_dir() is supposed to remove the "cache"
      subdirectory from the sysfs directory for a CPU when that CPU is
      being offlined.  It tries to do this by calling kobject_put() on
      the kobject for the subdirectory.  However, the subdirectory only
      gets removed once the last reference goes away, and the reference
      being put here may well not be the last reference.  That means
      that the "cache" subdirectory may still exist when the offlining
      operation has finished.  If the same CPU subsequently gets onlined,
      the code tries to add a new "cache" subdirectory.  If the old
      subdirectory has not yet been removed, we get a WARN_ON in the
      sysfs code, with stack trace, and an error message printed on the
      console.  Further, we ultimately end up with an online cpu with no
      "cache" subdirectory.
      
      This fixes it by doing an explicit kobject_del() at the point where
      we want the subdirectory to go away.  kobject_del() removes the sysfs
      directory even though the object still exists in memory.  The object
      will get freed at some point in the future.  A subsequent onlining
      operation can create a new sysfs directory, even if the old object
      still exists in memory, without causing any problems.
      
      Cc: stable@vger.kernel.org # v3.0+
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      91b973f9
    • J
      powerpc/mm: Fix mmap errno when MAP_FIXED is set and mapping exceeds the allowed address space · 19751c07
      jmarchan@redhat.com 提交于
      According to Posix, if MAP_FIXED is specified mmap shall set ENOMEM if
      the requested mapping exceeds the allowed range for address space of
      the process. The generic code set it right, but the specific powerpc
      slice_get_unmapped_area() function currently returns -EINVAL in that
      case.
      This patch corrects it.
      Signed-off-by: NJerome Marchand <jmarchan@redhat.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      19751c07
    • D
      powerpc/powernv/cpuidle: Back-end cpuidle driver for powernv platform. · 2c2e6ecf
      Deepthi Dharwar 提交于
      Following patch ports the cpuidle framework for powernv
      platform and also implements a cpuidle back-end powernv
      idle driver calling on to power7_nap and snooze idle states.
      Signed-off-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      2c2e6ecf
    • D
      powerpc/pseries/cpuidle: smt-snooze-delay cleanup. · 3fa8cad8
      Deepthi Dharwar 提交于
      smt-snooze-delay was designed to disable NAP state or delay the entry
      to the NAP state prior to adoption of cpuidle framework. This
      is per-cpu variable. With the coming of CPUIDLE framework,
      states can be disabled on per-cpu basis using the cpuidle/enable
      sysfs entry.
      
      Also, with the coming of cpuidle driver each state's target residency
      is per-driver unlike earlier which was per-device. Therefore,
      the per-cpu sysfs smt-snooze-delay which decides the target residency
      of the idle state on a particular cpu causes more confusion to the user
      as we cannot have different smt-snooze-delay (target residency)
      values for each cpu.
      
      In the current code, smt-snooze-delay functionality is completely broken.
      It makes sense to remove smt-snooze-delay from idle driver with the
      coming of cpuidle framework.
      However, sysfs files are retained as ppc64_util currently
      utilises it. Once we fix ppc64_util, propose to clean
      up the kernel code.
      Signed-off-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      3fa8cad8
    • D
      powerpc/pseries/cpuidle: Move processor_idle.c to drivers/cpuidle. · 962e7bd4
      Deepthi Dharwar 提交于
      Move the file from arch specific pseries/processor_idle.c
      to drivers/cpuidle/cpuidle-pseries.c
      Make the relevant Makefile and Kconfig changes.
      Also, introduce Kconfig.powerpc in drivers/cpuidle
      for all powerpc cpuidle drivers.
      Signed-off-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      962e7bd4