1. 06 3月, 2017 5 次提交
    • M
      powerpc/64: Fix L1D cache shape vector reporting L1I values · 9c7a0086
      Michael Ellerman 提交于
      It seems we didn't pay quite enough attention when testing the new cache
      shape vectors, which means we didn't notice the bug where the vector for
      the L1D was using the L1I values. Fix it, resulting in eg:
      
        L1I  cache size:     0x8000      32768B         32K
        L1I  line size:        0x80       8-way associative
        L1D  cache size:    0x10000      65536B         64K
        L1D  line size:        0x80       8-way associative
      
      Fixes: 98a5f361 ("powerpc: Add new cache geometry aux vectors")
      Cut-and-paste-bug-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Badly-reviewed-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9c7a0086
    • A
      powerpc/64: Avoid panic during boot due to divide by zero in init_cache_info() · 6ba422c7
      Anton Blanchard 提交于
      I see a panic in early boot when building with a recent gcc toolchain.
      The issue is a divide by zero, which is undefined. Older toolchains
      let us get away with it:
      
      int foo(int a) { return a / 0; }
      
      foo:
      	li 9,0
      	divw 3,3,9
      	extsw 3,3
      	blr
      
      But newer ones catch it:
      
      foo:
      	trap
      
      Add a check to avoid the divide by zero.
      
      Fixes: e2827fe5 ("powerpc/64: Clean up ppc64_caches using a struct per cache")
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6ba422c7
    • S
      powerpc: Update to new option-vector-5 format for CAS · 014d02cb
      Suraj Jitindar Singh 提交于
      On POWER9 the ibm,client-architecture-support (CAS) negotiation process
      has been updated to change how the host to guest negotiation is done for
      the new hash/radix mmu as well as the nest mmu, process tables and guest
      translation shootdown (GTSE).
      
      This is documented in the unreleased PAPR ACR "CAS option vector
      additions for P9".
      
      The host tells the guest which options it supports in
      ibm,arch-vec-5-platform-support. The guest then chooses a subset of these
      to request in the CAS call and these are agreed to in the
      ibm,architecture-vec-5 property of the chosen node.
      
      Thus we read ibm,arch-vec-5-platform-support and make our selection before
      calling CAS. We then parse the ibm,architecture-vec-5 property of the
      chosen node to check whether we should run as hash or radix.
      
      ibm,arch-vec-5-platform-support format:
      
      index value pairs: <index, val> ... <index, val>
      
      index: Option vector 5 byte number
      val:   Some representation of supported values
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Acked-by: NPaul Mackerras <paulus@ozlabs.org>
      [mpe: Don't print about unknown options, be consistent with OV5_FEAT]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      014d02cb
    • S
      powerpc: Parse the command line before calling CAS · 12cc9fd6
      Suraj Jitindar Singh 提交于
      On POWER9 the hypervisor requires the guest to decide whether it would
      like to use a hash or radix mmu model at the time it calls
      ibm,client-architecture-support (CAS) based on what the hypervisor has
      said it's allowed to do. It is possible to disable radix by passing
      "disable_radix" on the command line. The next patch will add support for
      the new CAS format, thus we need to parse the command line before calling
      CAS so we can correctly select which mmu we would like to use.
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Reviewed-by: NPaul Mackerras <paulus@ozlabs.org>
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      12cc9fd6
    • B
      powerpc/xics: Work around limitations of OPAL XICS priority handling · a69e2fb7
      Balbir Singh 提交于
      The CPPR (Current Processor Priority Register) of a XICS interrupt
      presentation controller contains a value N, such that only interrupts
      with a priority "more favoured" than N will be received by the CPU,
      where "more favoured" means "less than". So if the CPPR has the value 5
      then only interrupts with a priority of 0-4 inclusive will be received.
      
      In theory the CPPR can support a value of 0 to 255 inclusive.
      In practice Linux only uses values of 0, 4, 5 and 0xff. Setting the CPPR
      to 0 rejects all interrupts, setting it to 0xff allows all interrupts.
      The values 4 and 5 are used to differentiate IPIs from external
      interrupts. Setting the CPPR to 5 allows IPIs to be received but not
      external interrupts.
      
      The CPPR emulation in the OPAL XICS implementation only directly
      supports priorities 0 and 0xff. All other priorities are considered
      equivalent, and mapped to a single priority value internally. This means
      when using icp-opal we can not allow IPIs but not externals.
      
      This breaks Linux's use of priority values when a CPU is hot unplugged.
      After migrating IRQs away from the CPU that is being offlined, we set
      the priority to 5, meaning we still want the offline CPU to receive
      IPIs. But the effect of the OPAL XICS emulation's use of a single
      priority value is that all interrupts are rejected by the CPU. With the
      CPU offline, and not receiving IPIs, we may not be able to wake it up to
      bring it back online.
      
      The first part of the fix is in icp_opal_set_cpu_priority(). CPPR values
      of 0 to 4 inclusive will correctly cause all interrupts to be rejected,
      so we pass those CPPR values through to OPAL. However if we are called
      with a CPPR of 5 or greater, the caller is expecting to be able to allow
      IPIs but not external interrupts. We know this doesn't work, so instead
      of rejecting all interrupts we choose the opposite which is to allow all
      interrupts. This is still not correct behaviour, but we know for the
      only existing caller (xics_migrate_irqs_away()), that it is the better
      option.
      
      The other part of the fix is in xics_migrate_irqs_away(). Instead of
      setting priority (CPPR) to 0, and then back to 5 before migrating IRQs,
      we migrate the IRQs before setting the priority back to 5. This should
      have no effect on an ICP backend with a working set_priority(), and on
      icp-opal it means we will keep all interrupts blocked until after we've
      finished doing the IRQ migration. Additionally we wait for 5ms after
      doing the migration to make sure there are no IRQs in flight.
      
      Fixes: d7436188 ("powerpc/xics: Add ICP OPAL backend")
      Cc: stable@vger.kernel.org # v4.8+
      Suggested-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reported-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Tested-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
      [mpe: Rewrote comments and change log, change delay to 5ms]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a69e2fb7
  2. 04 3月, 2017 2 次提交
  3. 03 3月, 2017 6 次提交
    • L
      powerpc/booke: Fix boot crash due to null hugepd · 3fb66a70
      Laurentiu Tudor 提交于
      On 32-bit book-e machines, hugepd_ok() no longer takes into account null
      hugepd values, causing this crash at boot:
      
        Unable to handle kernel paging request for data at address 0x80000000
        ...
        NIP [c0018378] follow_huge_addr+0x38/0xf0
        LR [c001836c] follow_huge_addr+0x2c/0xf0
        Call Trace:
         follow_huge_addr+0x2c/0xf0 (unreliable)
         follow_page_mask+0x40/0x3e0
         __get_user_pages+0xc8/0x450
         get_user_pages_remote+0x8c/0x250
         copy_strings+0x110/0x390
         copy_strings_kernel+0x2c/0x50
         do_execveat_common+0x478/0x630
         do_execve+0x2c/0x40
         try_to_run_init_process+0x18/0x60
         kernel_init+0xbc/0x110
         ret_from_kernel_thread+0x5c/0x64
      
      This impacts all nxp (ex-freescale) 32-bit booke platforms.
      
      This was caused by the change of hugepd_t.pd from signed to unsigned,
      and the update to the nohash version of hugepd_ok(). Previously
      hugepd_ok() could exclude all non-huge and NULL pgds using > 0, whereas
      now we need to explicitly check that the value is not zero and also that
      PD_HUGE is *clear*.
      
      This isn't protected by the pgd_none() check in __find_linux_pte_or_hugepte()
      because on 32-bit we use pgtable-nopud.h, which causes the pgd_none()
      check to be always false.
      
      Fixes: 20717e1f ("powerpc/mm: Fix little-endian 4K hugetlb")
      Cc: stable@vger.kernel.org # v4.7+
      Reported-by: NMadalin-Cristian Bucur <madalin.bucur@nxp.com>
      Signed-off-by: NLaurentiu Tudor <laurentiu.tudor@nxp.com>
      [mpe: Flesh out change log details.]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3fb66a70
    • N
      powerpc: Fix compiling a BE kernel with a powerpc64le toolchain · 4dc831aa
      Nicholas Piggin 提交于
      GCC can compile with either endian, but the default ABI version is set
      based on the default endianness of the toolchain. Alan Modra says:
      
        you need both -mbig and -mabi=elfv1 to make a powerpc64le gcc
        generate powerpc64 code
      
      The opposite is true for powerpc64 when generating -mlittle it
      requires -mabi=elfv2 to generate v2 ABI, which we were already doing.
      
      This change adds ABI annotations together with endianness for all cases,
      LE and BE. This fixes the case of building a BE kernel with a toolchain
      that is LE by default.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Tested-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4dc831aa
    • G
      powerpc/powernv: Fix bug due to labeling ambiguity in power_enter_stop · 424f8acd
      Gautham R. Shenoy 提交于
      Commit 09206b60 ("powernv: Pass PSSCR value and mask to
      power9_idle_stop") added additional code in power_enter_stop() to
      distinguish between stop requests whose PSSCR had ESL=EC=1 from those
      which did not. When ESL=EC=1, we do a forward-jump to a location
      labelled by "1", which had the code to handle the ESL=EC=1 case.
      
      Unfortunately just a couple of instructions before this label, is the
      macro IDLE_STATE_ENTER_SEQ() which also has a label "1" in its
      expansion.
      
      As a result, the current code can result in directly executing stop
      instruction for deep stop requests with PSSCR ESL=EC=1, without saving
      the hypervisor state.
      
      Fix this BUG by labeling the location that handles ESL=EC=1 case with
      a more descriptive label ".Lhandle_esl_ec_set" (local label suggestion
      a la .Lxx from Anton Blanchard).
      
      While at it, rename the label "2" labelling the location of the code
      handling entry into deep stop states with ".Lhandle_deep_stop".
      
      For a good measure, change the label in IDLE_STATE_ENTER_SEQ() macro
      to an not-so commonly used value in order to avoid similar mishaps in
      the future.
      
      Fixes: 09206b60 ("powernv: Pass PSSCR value and mask to power9_idle_stop")
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      424f8acd
    • P
      powerpc/64: Invalidate process table caching after setting process table · 7a70d728
      Paul Mackerras 提交于
      The POWER9 MMU reads and caches entries from the process table.
      When we kexec from one kernel to another, the second kernel sets
      its process table pointer but doesn't currently do anything to
      make the CPU invalidate any cached entries from the old process table.
      This adds a tlbie (TLB invalidate entry) instruction with parameters
      to invalidate caching of the process table after the new process
      table is installed.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7a70d728
    • R
      powerpc: emulate_step() tests for load/store instructions · 4ceae137
      Ravi Bangoria 提交于
      Add new selftest that test emulate_step for Normal, Floating Point,
      Vector and Vector Scalar - load/store instructions. Test should run
      at boot time if CONFIG_KPROBES_SANITY_TEST and CONFIG_PPC64 is set.
      
      Sample log:
      
        emulate_step_test: ld             : PASS
        emulate_step_test: lwz            : PASS
        emulate_step_test: lwzx           : PASS
        emulate_step_test: std            : PASS
        emulate_step_test: ldarx / stdcx. : PASS
        emulate_step_test: lfsx           : PASS
        emulate_step_test: stfsx          : PASS
        emulate_step_test: lfdx           : PASS
        emulate_step_test: stfdx          : PASS
        emulate_step_test: lvx            : PASS
        emulate_step_test: stvx           : PASS
        emulate_step_test: lxvd2x         : PASS
        emulate_step_test: stxvd2x        : PASS
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      [mpe: Drop start/complete lines, make it all __init]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4ceae137
    • R
      powerpc: Emulation support for load/store instructions on LE · e148bd17
      Ravi Bangoria 提交于
      emulate_step() uses a number of underlying kernel functions that were
      initially not enabled for LE. This has been rectified since. So, fix
      emulate_step() for LE for the corresponding instructions.
      
      Cc: stable@vger.kernel.org # v3.18+
      Reported-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e148bd17
  4. 28 2月, 2017 5 次提交
  5. 25 2月, 2017 4 次提交
  6. 23 2月, 2017 5 次提交
    • M
      lib/show_mem.c: teach show_mem to work with the given nodemask · 9af744d7
      Michal Hocko 提交于
      show_mem() allows to filter out node specific data which is irrelevant
      to the allocation request via SHOW_MEM_FILTER_NODES.  The filtering is
      done in skip_free_areas_node which skips all nodes which are not in the
      mems_allowed of the current process.  This works most of the time as
      expected because the nodemask shouldn't be outside of the allocating
      task but there are some exceptions.  E.g.  memory hotplug might want to
      request allocations from outside of the allowed nodes (see
      new_node_page).
      
      Get rid of this hardcoded behavior and push the allocation mask down the
      show_mem path and use it instead of cpuset_current_mems_allowed.  NULL
      nodemask is interpreted as cpuset_current_mems_allowed.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/20170117091543.25850-5-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9af744d7
    • D
      powerpc: do not make the entire heap executable · 16e72e9b
      Denys Vlasenko 提交于
      On 32-bit powerpc the ELF PLT sections of binaries (built with
      --bss-plt, or with a toolchain which defaults to it) look like this:
      
        [17] .sbss             NOBITS          0002aff8 01aff8 000014 00  WA  0   0  4
        [18] .plt              NOBITS          0002b00c 01aff8 000084 00 WAX  0   0  4
        [19] .bss              NOBITS          0002b090 01aff8 0000a4 00  WA  0   0  4
      
      Which results in an ELF load header:
      
        Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
        LOAD           0x019c70 0x00029c70 0x00029c70 0x01388 0x014c4 RWE 0x10000
      
      This is all correct, the load region containing the PLT is marked as
      executable.  Note that the PLT starts at 0002b00c but the file mapping
      ends at 0002aff8, so the PLT falls in the 0 fill section described by
      the load header, and after a page boundary.
      
      Unfortunately the generic ELF loader ignores the X bit in the load
      headers when it creates the 0 filled non-file backed mappings.  It
      assumes all of these mappings are RW BSS sections, which is not the case
      for PPC.
      
      gcc/ld has an option (--secure-plt) to not do this, this is said to
      incur a small performance penalty.
      
      Currently, to support 32-bit binaries with PLT in BSS kernel maps
      *entire brk area* with executable rights for all binaries, even
      --secure-plt ones.
      
      Stop doing that.
      
      Teach the ELF loader to check the X bit in the relevant load header and
      create 0 filled anonymous mappings that are executable if the load
      header requests that.
      
      Test program showing the difference in /proc/$PID/maps:
      
      int main() {
      	char buf[16*1024];
      	char *p = malloc(123); /* make "[heap]" mapping appear */
      	int fd = open("/proc/self/maps", O_RDONLY);
      	int len = read(fd, buf, sizeof(buf));
      	write(1, buf, len);
      	printf("%p\n", p);
      	return 0;
      }
      
      Compiled using: gcc -mbss-plt -m32 -Os test.c -otest
      
      Unpatched ppc64 kernel:
      00100000-00120000 r-xp 00000000 00:00 0                                  [vdso]
      0fe10000-0ffd0000 r-xp 00000000 fd:00 67898094                           /usr/lib/libc-2.17.so
      0ffd0000-0ffe0000 r--p 001b0000 fd:00 67898094                           /usr/lib/libc-2.17.so
      0ffe0000-0fff0000 rw-p 001c0000 fd:00 67898094                           /usr/lib/libc-2.17.so
      10000000-10010000 r-xp 00000000 fd:00 100674505                          /home/user/test
      10010000-10020000 r--p 00000000 fd:00 100674505                          /home/user/test
      10020000-10030000 rw-p 00010000 fd:00 100674505                          /home/user/test
      10690000-106c0000 rwxp 00000000 00:00 0                                  [heap]
      f7f70000-f7fa0000 r-xp 00000000 fd:00 67898089                           /usr/lib/ld-2.17.so
      f7fa0000-f7fb0000 r--p 00020000 fd:00 67898089                           /usr/lib/ld-2.17.so
      f7fb0000-f7fc0000 rw-p 00030000 fd:00 67898089                           /usr/lib/ld-2.17.so
      ffa90000-ffac0000 rw-p 00000000 00:00 0                                  [stack]
      0x10690008
      
      Patched ppc64 kernel:
      00100000-00120000 r-xp 00000000 00:00 0                                  [vdso]
      0fe10000-0ffd0000 r-xp 00000000 fd:00 67898094                           /usr/lib/libc-2.17.so
      0ffd0000-0ffe0000 r--p 001b0000 fd:00 67898094                           /usr/lib/libc-2.17.so
      0ffe0000-0fff0000 rw-p 001c0000 fd:00 67898094                           /usr/lib/libc-2.17.so
      10000000-10010000 r-xp 00000000 fd:00 100674505                          /home/user/test
      10010000-10020000 r--p 00000000 fd:00 100674505                          /home/user/test
      10020000-10030000 rw-p 00010000 fd:00 100674505                          /home/user/test
      10180000-101b0000 rw-p 00000000 00:00 0                                  [heap]
                        ^^^^ this has changed
      f7c60000-f7c90000 r-xp 00000000 fd:00 67898089                           /usr/lib/ld-2.17.so
      f7c90000-f7ca0000 r--p 00020000 fd:00 67898089                           /usr/lib/ld-2.17.so
      f7ca0000-f7cb0000 rw-p 00030000 fd:00 67898089                           /usr/lib/ld-2.17.so
      ff860000-ff890000 rw-p 00000000 00:00 0                                  [stack]
      0x10180008
      
      The patch was originally posted in 2012 by Jason Gunthorpe
      and apparently ignored:
      
      https://lkml.org/lkml/2012/9/30/138
      
      Lightly run-tested.
      
      Link: http://lkml.kernel.org/r/20161215131950.23054-1-dvlasenk@redhat.comSigned-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Tested-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Florian Weimer <fweimer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      16e72e9b
    • F
      powerpc: Remove leftover cputime_to_nsecs call causing build error · 9f3768e0
      Frédéric Weisbecker 提交于
      This type conversion is a leftover that got ignored during the kcpustat
      conversion to nanosecs, resulting in build breakage with config having
      CONFIG_NO_HZ_FULL=y.
      
      	arch/powerpc/kernel/time.c: In function 'running_clock':
      	arch/powerpc/kernel/time.c:712:2: error: implicit declaration of function 'cputime_to_nsecs' [-Werror=implicit-function-declaration]
      	  return local_clock() - cputime_to_nsecs(kcpustat_this_cpu->cpustat[CPUTIME_STEAL]);
      
      All we need is to remove it.
      
      Fixes: e7f340ca ("powerpc, sched/cputime: Remove unused cputime definitions")
      Reported-by: NAbdul Haleem <abdhalee@linux.vnet.ibm.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9f3768e0
    • A
      powerpc/mm/hash: Always clear UPRT and Host Radix bits when setting up CPU · fda2d27d
      Aneesh Kumar K.V 提交于
      We will set LPCR with correct value for radix during int. This make sure we
      start with a sanitized value of LPCR. In case of kexec, cpus can have LPCR
      value based on the previous translation mode we were running.
      
      Fixes: fe036a06 ("powerpc/64/kexec: Fix MMU cleanup on radix")
      Cc: stable@vger.kernel.org # v4.9+
      Acked-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      fda2d27d
    • N
      powerpc/optprobes: Fix TOC handling in optprobes trampoline · f558b37b
      Naveen N. Rao 提交于
      Optprobes on powerpc are limited to kernel text area. We decided to also
      optimize kretprobe_trampoline since that is also in kernel text area.
      However,we failed to take into consideration the fact that the same
      trampoline is also used to catch function returns from kernel modules.
      As an example:
      
        $ sudo modprobe kobject-example
        $ sudo bash -c "echo 'r foo_show+8' > /sys/kernel/debug/tracing/kprobe_events"
        $ sudo bash -c "echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable"
        $ sudo cat /sys/kernel/debug/kprobes/list
        c000000000041350  k  kretprobe_trampoline+0x0    [OPTIMIZED]
        d000000000e00200  r  foo_show+0x8  kobject_example
        $ cat /sys/kernel/kobject_example/foo
        Segmentation fault
      
      With the below (trimmed) splat in dmesg:
      
        Unable to handle kernel paging request for data at address 0xfec40000
        Faulting instruction address: 0xc000000000041540
        Oops: Kernel access of bad area, sig: 11 [#1]
        ...
        NIP [c000000000041540] optimized_callback+0x70/0xe0
        LR [c000000000041e60] optinsn_slot+0xf8/0x10000
        Call Trace:
        [c0000000c7327850] [c000000000289af4] alloc_set_pte+0x1c4/0x860 (unreliable)
        [c0000000c7327890] [c000000000041e60] optinsn_slot+0xf8/0x10000
        --- interrupt: 700 at 0xc0000000c7327a80
      	       LR = kretprobe_trampoline+0x0/0x10
        [c0000000c7327ba0] [c0000000003a30d4] sysfs_kf_seq_show+0x104/0x1d0
        [c0000000c7327bf0] [c0000000003a0bb4] kernfs_seq_show+0x44/0x60
        [c0000000c7327c10] [c000000000330578] seq_read+0xf8/0x560
        [c0000000c7327cb0] [c0000000003a1e64] kernfs_fop_read+0x194/0x260
        [c0000000c7327d00] [c0000000002f9954] __vfs_read+0x44/0x1a0
        [c0000000c7327d90] [c0000000002fb4cc] vfs_read+0xbc/0x1b0
        [c0000000c7327de0] [c0000000002fd138] SyS_read+0x68/0x110
        [c0000000c7327e30] [c00000000000b8e0] system_call+0x38/0xfc
      
      Fix this by loading up the kernel TOC before calling into the kernel.
      The original TOC gets restored as part of the usual pt_regs restore.
      
      Fixes: 762df10b ("powerpc/kprobes: Optimize kprobe in kretprobe_trampoline()")
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f558b37b
  7. 22 2月, 2017 2 次提交
  8. 21 2月, 2017 3 次提交
    • M
      powerpc/pseries: Advertise Hot Plug Event support to firmware · 3dbbaf20
      Michael Roth 提交于
      With the inclusion of commit 333f7b76 ("powerpc/pseries: Implement
      indexed-count hotplug memory add") and commit 75384347
      ("powerpc/pseries: Implement indexed-count hotplug memory remove"), we
      now have complete handling of the RTAS hotplug event format as described
      by PAPR via ACR "PAPR Changes for Hotplug RTAS Events".
      
      This capability is indicated by byte 6, bit 2 (5 in IBM numbering) of
      architecture option vector 5, and allows for greater control over
      cpu/memory/pci hot plug/unplug operations.
      
      Existing pseries kernels will utilize this capability based on the
      existence of the /event-sources/hot-plug-events DT property, so we
      only need to advertise it via CAS and do not need a corresponding
      FW_FEATURE_* value to test for.
      Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3dbbaf20
    • D
      powerpc/xmon: Dump memory in CPU endian format · 5e48dc0a
      Douglas Miller 提交于
      Extend the dump command to allow display of 2, 4, and 8 byte words in
      CPU endian format. Also adds dump command for "1 byte values" for the
      sake of symmetry. New commands are:
      
        d1	dump 1 byte values
        d2	dump 2 byte values
        d4	dump 4 byte values
        d8	dump 8 byte values
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NDouglas Miller <dougmill@linux.vnet.ibm.com>
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      5e48dc0a
    • N
      powerpc/pseries: Revert 'Auto-online hotplugged memory' · 943db62c
      Nathan Fontenot 提交于
      This reverts commit ec999072 ("powerpc/pseries: Auto-online
      hotplugged memory"), and 9dc51281 ("powerpc: Fix unused function
      warning 'lmb_to_memblock'").
      
      Using the auto-online acpability does online added memory but does not
      update the associated device struct to indicate that the memory is
      online. This causes the pseries memory DLPAR code to fail when trying to
      remove a LMB that was previously removed and added back. This happens
      when validating that the LMB is removable.
      
      This patch reverts to the previous behavior of calling device_online()
      to online the LMB when it is DLPAR added and moves the lmb_to_memblock()
      routine out of CONFIG_MEMORY_HOTREMOVE now that we call it for add.
      
      Fixes: ec999072 ("powerpc/pseries: Auto-online hotplugged memory")
      Cc: stable@vger.kernel.org # v4.8+
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      943db62c
  9. 20 2月, 2017 1 次提交
  10. 18 2月, 2017 4 次提交
    • N
      powerpc/64: Implement clear_bit_unlock_is_negative_byte() · d11914b2
      Nicholas Piggin 提交于
      Commit b91e1302 ("mm: optimize PageWaiters bit use for
      unlock_page()") added a special bitop function to speed up
      unlock_page(). Implement this for 64-bit powerpc.
      
      This improves the unlock_page() core code from this:
      
      	li	9,1
      	lwsync
      1:	ldarx	10,0,3,0
      	andc	10,10,9
      	stdcx.	10,0,3
      	bne-	1b
      	ori	2,2,0
      	ld	9,0(3)
      	andi.	10,9,0x80
      	beqlr
      	li	4,0
      	b	wake_up_page_bit
      
      To this:
      
      	li	10,1
      	lwsync
      1:	ldarx	9,0,3,0
      	andc	9,9,10
      	stdcx.	9,0,3
      	bne-	1b
      	andi.	10,9,0x80
      	beqlr
      	li	4,0
      	b	wake_up_page_bit
      
      In a test of elapsed time for dd writing into 16GB of already-dirty
      pagecache on a POWER8 with 4K pages, which has one unlock_page per 4kB
      this patch reduced overhead by 1.1%:
      
          N           Min           Max        Median           Avg        Stddev
      x  19         2.578         2.619         2.594         2.595         0.011
      +  19         2.552         2.592         2.564         2.565         0.008
      Difference at 95.0% confidence
      	-0.030  +/- 0.006
      	-1.142% +/- 0.243%
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Made 64-bit only until I can test it properly on 32-bit]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d11914b2
    • P
      KVM: PPC: Book3S HV: Disable HPT resizing on POWER9 for now · bcd3bb63
      Paul Mackerras 提交于
      The new HPT resizing code added in commit b5baa687 ("KVM: PPC:
      Book3S HV: KVM-HV HPT resizing implementation", 2016-12-20) doesn't
      have code to handle the new HPTE format which POWER9 uses.  Thus it
      would be best not to advertise it to userspace on POWER9 systems
      until it works properly.
      
      Also, since resize_hpt_rehash_hpte() contains BUG_ON() calls that
      could be hit on POWER9, let's prevent it from being called on POWER9
      for now.
      Acked-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      bcd3bb63
    • D
      bpf: make jited programs visible in traces · 74451e66
      Daniel Borkmann 提交于
      Long standing issue with JITed programs is that stack traces from
      function tracing check whether a given address is kernel code
      through {__,}kernel_text_address(), which checks for code in core
      kernel, modules and dynamically allocated ftrace trampolines. But
      what is still missing is BPF JITed programs (interpreted programs
      are not an issue as __bpf_prog_run() will be attributed to them),
      thus when a stack trace is triggered, the code walking the stack
      won't see any of the JITed ones. The same for address correlation
      done from user space via reading /proc/kallsyms. This is read by
      tools like perf, but the latter is also useful for permanent live
      tracing with eBPF itself in combination with stack maps when other
      eBPF types are part of the callchain. See offwaketime example on
      dumping stack from a map.
      
      This work tries to tackle that issue by making the addresses and
      symbols known to the kernel. The lookup from *kernel_text_address()
      is implemented through a latched RB tree that can be read under
      RCU in fast-path that is also shared for symbol/size/offset lookup
      for a specific given address in kallsyms. The slow-path iteration
      through all symbols in the seq file done via RCU list, which holds
      a tiny fraction of all exported ksyms, usually below 0.1 percent.
      Function symbols are exported as bpf_prog_<tag>, in order to aide
      debugging and attribution. This facility is currently enabled for
      root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening
      is active in any mode. The rationale behind this is that still a lot
      of systems ship with world read permissions on kallsyms thus addresses
      should not get suddenly exposed for them. If that situation gets
      much better in future, we always have the option to change the
      default on this. Likewise, unprivileged programs are not allowed
      to add entries there either, but that is less of a concern as most
      such programs types relevant in this context are for root-only anyway.
      If enabled, call graphs and stack traces will then show a correct
      attribution; one example is illustrated below, where the trace is
      now visible in tooling such as perf script --kallsyms=/proc/kallsyms
      and friends.
      
      Before:
      
        7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux)
               f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so)
      
      After:
      
        7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux)
        [...]
        7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux)
               f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so)
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      74451e66
    • D
      bpf: remove stubs for cBPF from arch code · 9383191d
      Daniel Borkmann 提交于
      Remove the dummy bpf_jit_compile() stubs for eBPF JITs and make
      that a single __weak function in the core that can be overridden
      similarly to the eBPF one. Also remove stale pr_err() mentions
      of bpf_jit_compile.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9383191d
  11. 17 2月, 2017 3 次提交