1. 02 3月, 2017 12 次提交
  2. 28 2月, 2017 5 次提交
  3. 25 2月, 2017 4 次提交
  4. 23 2月, 2017 5 次提交
    • M
      lib/show_mem.c: teach show_mem to work with the given nodemask · 9af744d7
      Michal Hocko 提交于
      show_mem() allows to filter out node specific data which is irrelevant
      to the allocation request via SHOW_MEM_FILTER_NODES.  The filtering is
      done in skip_free_areas_node which skips all nodes which are not in the
      mems_allowed of the current process.  This works most of the time as
      expected because the nodemask shouldn't be outside of the allocating
      task but there are some exceptions.  E.g.  memory hotplug might want to
      request allocations from outside of the allowed nodes (see
      new_node_page).
      
      Get rid of this hardcoded behavior and push the allocation mask down the
      show_mem path and use it instead of cpuset_current_mems_allowed.  NULL
      nodemask is interpreted as cpuset_current_mems_allowed.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/20170117091543.25850-5-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9af744d7
    • D
      powerpc: do not make the entire heap executable · 16e72e9b
      Denys Vlasenko 提交于
      On 32-bit powerpc the ELF PLT sections of binaries (built with
      --bss-plt, or with a toolchain which defaults to it) look like this:
      
        [17] .sbss             NOBITS          0002aff8 01aff8 000014 00  WA  0   0  4
        [18] .plt              NOBITS          0002b00c 01aff8 000084 00 WAX  0   0  4
        [19] .bss              NOBITS          0002b090 01aff8 0000a4 00  WA  0   0  4
      
      Which results in an ELF load header:
      
        Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
        LOAD           0x019c70 0x00029c70 0x00029c70 0x01388 0x014c4 RWE 0x10000
      
      This is all correct, the load region containing the PLT is marked as
      executable.  Note that the PLT starts at 0002b00c but the file mapping
      ends at 0002aff8, so the PLT falls in the 0 fill section described by
      the load header, and after a page boundary.
      
      Unfortunately the generic ELF loader ignores the X bit in the load
      headers when it creates the 0 filled non-file backed mappings.  It
      assumes all of these mappings are RW BSS sections, which is not the case
      for PPC.
      
      gcc/ld has an option (--secure-plt) to not do this, this is said to
      incur a small performance penalty.
      
      Currently, to support 32-bit binaries with PLT in BSS kernel maps
      *entire brk area* with executable rights for all binaries, even
      --secure-plt ones.
      
      Stop doing that.
      
      Teach the ELF loader to check the X bit in the relevant load header and
      create 0 filled anonymous mappings that are executable if the load
      header requests that.
      
      Test program showing the difference in /proc/$PID/maps:
      
      int main() {
      	char buf[16*1024];
      	char *p = malloc(123); /* make "[heap]" mapping appear */
      	int fd = open("/proc/self/maps", O_RDONLY);
      	int len = read(fd, buf, sizeof(buf));
      	write(1, buf, len);
      	printf("%p\n", p);
      	return 0;
      }
      
      Compiled using: gcc -mbss-plt -m32 -Os test.c -otest
      
      Unpatched ppc64 kernel:
      00100000-00120000 r-xp 00000000 00:00 0                                  [vdso]
      0fe10000-0ffd0000 r-xp 00000000 fd:00 67898094                           /usr/lib/libc-2.17.so
      0ffd0000-0ffe0000 r--p 001b0000 fd:00 67898094                           /usr/lib/libc-2.17.so
      0ffe0000-0fff0000 rw-p 001c0000 fd:00 67898094                           /usr/lib/libc-2.17.so
      10000000-10010000 r-xp 00000000 fd:00 100674505                          /home/user/test
      10010000-10020000 r--p 00000000 fd:00 100674505                          /home/user/test
      10020000-10030000 rw-p 00010000 fd:00 100674505                          /home/user/test
      10690000-106c0000 rwxp 00000000 00:00 0                                  [heap]
      f7f70000-f7fa0000 r-xp 00000000 fd:00 67898089                           /usr/lib/ld-2.17.so
      f7fa0000-f7fb0000 r--p 00020000 fd:00 67898089                           /usr/lib/ld-2.17.so
      f7fb0000-f7fc0000 rw-p 00030000 fd:00 67898089                           /usr/lib/ld-2.17.so
      ffa90000-ffac0000 rw-p 00000000 00:00 0                                  [stack]
      0x10690008
      
      Patched ppc64 kernel:
      00100000-00120000 r-xp 00000000 00:00 0                                  [vdso]
      0fe10000-0ffd0000 r-xp 00000000 fd:00 67898094                           /usr/lib/libc-2.17.so
      0ffd0000-0ffe0000 r--p 001b0000 fd:00 67898094                           /usr/lib/libc-2.17.so
      0ffe0000-0fff0000 rw-p 001c0000 fd:00 67898094                           /usr/lib/libc-2.17.so
      10000000-10010000 r-xp 00000000 fd:00 100674505                          /home/user/test
      10010000-10020000 r--p 00000000 fd:00 100674505                          /home/user/test
      10020000-10030000 rw-p 00010000 fd:00 100674505                          /home/user/test
      10180000-101b0000 rw-p 00000000 00:00 0                                  [heap]
                        ^^^^ this has changed
      f7c60000-f7c90000 r-xp 00000000 fd:00 67898089                           /usr/lib/ld-2.17.so
      f7c90000-f7ca0000 r--p 00020000 fd:00 67898089                           /usr/lib/ld-2.17.so
      f7ca0000-f7cb0000 rw-p 00030000 fd:00 67898089                           /usr/lib/ld-2.17.so
      ff860000-ff890000 rw-p 00000000 00:00 0                                  [stack]
      0x10180008
      
      The patch was originally posted in 2012 by Jason Gunthorpe
      and apparently ignored:
      
      https://lkml.org/lkml/2012/9/30/138
      
      Lightly run-tested.
      
      Link: http://lkml.kernel.org/r/20161215131950.23054-1-dvlasenk@redhat.comSigned-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Tested-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Florian Weimer <fweimer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      16e72e9b
    • F
      powerpc: Remove leftover cputime_to_nsecs call causing build error · 9f3768e0
      Frédéric Weisbecker 提交于
      This type conversion is a leftover that got ignored during the kcpustat
      conversion to nanosecs, resulting in build breakage with config having
      CONFIG_NO_HZ_FULL=y.
      
      	arch/powerpc/kernel/time.c: In function 'running_clock':
      	arch/powerpc/kernel/time.c:712:2: error: implicit declaration of function 'cputime_to_nsecs' [-Werror=implicit-function-declaration]
      	  return local_clock() - cputime_to_nsecs(kcpustat_this_cpu->cpustat[CPUTIME_STEAL]);
      
      All we need is to remove it.
      
      Fixes: e7f340ca ("powerpc, sched/cputime: Remove unused cputime definitions")
      Reported-by: NAbdul Haleem <abdhalee@linux.vnet.ibm.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9f3768e0
    • A
      powerpc/mm/hash: Always clear UPRT and Host Radix bits when setting up CPU · fda2d27d
      Aneesh Kumar K.V 提交于
      We will set LPCR with correct value for radix during int. This make sure we
      start with a sanitized value of LPCR. In case of kexec, cpus can have LPCR
      value based on the previous translation mode we were running.
      
      Fixes: fe036a06 ("powerpc/64/kexec: Fix MMU cleanup on radix")
      Cc: stable@vger.kernel.org # v4.9+
      Acked-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      fda2d27d
    • N
      powerpc/optprobes: Fix TOC handling in optprobes trampoline · f558b37b
      Naveen N. Rao 提交于
      Optprobes on powerpc are limited to kernel text area. We decided to also
      optimize kretprobe_trampoline since that is also in kernel text area.
      However,we failed to take into consideration the fact that the same
      trampoline is also used to catch function returns from kernel modules.
      As an example:
      
        $ sudo modprobe kobject-example
        $ sudo bash -c "echo 'r foo_show+8' > /sys/kernel/debug/tracing/kprobe_events"
        $ sudo bash -c "echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable"
        $ sudo cat /sys/kernel/debug/kprobes/list
        c000000000041350  k  kretprobe_trampoline+0x0    [OPTIMIZED]
        d000000000e00200  r  foo_show+0x8  kobject_example
        $ cat /sys/kernel/kobject_example/foo
        Segmentation fault
      
      With the below (trimmed) splat in dmesg:
      
        Unable to handle kernel paging request for data at address 0xfec40000
        Faulting instruction address: 0xc000000000041540
        Oops: Kernel access of bad area, sig: 11 [#1]
        ...
        NIP [c000000000041540] optimized_callback+0x70/0xe0
        LR [c000000000041e60] optinsn_slot+0xf8/0x10000
        Call Trace:
        [c0000000c7327850] [c000000000289af4] alloc_set_pte+0x1c4/0x860 (unreliable)
        [c0000000c7327890] [c000000000041e60] optinsn_slot+0xf8/0x10000
        --- interrupt: 700 at 0xc0000000c7327a80
      	       LR = kretprobe_trampoline+0x0/0x10
        [c0000000c7327ba0] [c0000000003a30d4] sysfs_kf_seq_show+0x104/0x1d0
        [c0000000c7327bf0] [c0000000003a0bb4] kernfs_seq_show+0x44/0x60
        [c0000000c7327c10] [c000000000330578] seq_read+0xf8/0x560
        [c0000000c7327cb0] [c0000000003a1e64] kernfs_fop_read+0x194/0x260
        [c0000000c7327d00] [c0000000002f9954] __vfs_read+0x44/0x1a0
        [c0000000c7327d90] [c0000000002fb4cc] vfs_read+0xbc/0x1b0
        [c0000000c7327de0] [c0000000002fd138] SyS_read+0x68/0x110
        [c0000000c7327e30] [c00000000000b8e0] system_call+0x38/0xfc
      
      Fix this by loading up the kernel TOC before calling into the kernel.
      The original TOC gets restored as part of the usual pt_regs restore.
      
      Fixes: 762df10b ("powerpc/kprobes: Optimize kprobe in kretprobe_trampoline()")
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f558b37b
  5. 22 2月, 2017 2 次提交
  6. 21 2月, 2017 3 次提交
    • M
      powerpc/pseries: Advertise Hot Plug Event support to firmware · 3dbbaf20
      Michael Roth 提交于
      With the inclusion of commit 333f7b76 ("powerpc/pseries: Implement
      indexed-count hotplug memory add") and commit 75384347
      ("powerpc/pseries: Implement indexed-count hotplug memory remove"), we
      now have complete handling of the RTAS hotplug event format as described
      by PAPR via ACR "PAPR Changes for Hotplug RTAS Events".
      
      This capability is indicated by byte 6, bit 2 (5 in IBM numbering) of
      architecture option vector 5, and allows for greater control over
      cpu/memory/pci hot plug/unplug operations.
      
      Existing pseries kernels will utilize this capability based on the
      existence of the /event-sources/hot-plug-events DT property, so we
      only need to advertise it via CAS and do not need a corresponding
      FW_FEATURE_* value to test for.
      Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3dbbaf20
    • D
      powerpc/xmon: Dump memory in CPU endian format · 5e48dc0a
      Douglas Miller 提交于
      Extend the dump command to allow display of 2, 4, and 8 byte words in
      CPU endian format. Also adds dump command for "1 byte values" for the
      sake of symmetry. New commands are:
      
        d1	dump 1 byte values
        d2	dump 2 byte values
        d4	dump 4 byte values
        d8	dump 8 byte values
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NDouglas Miller <dougmill@linux.vnet.ibm.com>
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      5e48dc0a
    • N
      powerpc/pseries: Revert 'Auto-online hotplugged memory' · 943db62c
      Nathan Fontenot 提交于
      This reverts commit ec999072 ("powerpc/pseries: Auto-online
      hotplugged memory"), and 9dc51281 ("powerpc: Fix unused function
      warning 'lmb_to_memblock'").
      
      Using the auto-online acpability does online added memory but does not
      update the associated device struct to indicate that the memory is
      online. This causes the pseries memory DLPAR code to fail when trying to
      remove a LMB that was previously removed and added back. This happens
      when validating that the LMB is removable.
      
      This patch reverts to the previous behavior of calling device_online()
      to online the LMB when it is DLPAR added and moves the lmb_to_memblock()
      routine out of CONFIG_MEMORY_HOTREMOVE now that we call it for add.
      
      Fixes: ec999072 ("powerpc/pseries: Auto-online hotplugged memory")
      Cc: stable@vger.kernel.org # v4.8+
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      943db62c
  7. 20 2月, 2017 1 次提交
  8. 18 2月, 2017 4 次提交
    • N
      powerpc/64: Implement clear_bit_unlock_is_negative_byte() · d11914b2
      Nicholas Piggin 提交于
      Commit b91e1302 ("mm: optimize PageWaiters bit use for
      unlock_page()") added a special bitop function to speed up
      unlock_page(). Implement this for 64-bit powerpc.
      
      This improves the unlock_page() core code from this:
      
      	li	9,1
      	lwsync
      1:	ldarx	10,0,3,0
      	andc	10,10,9
      	stdcx.	10,0,3
      	bne-	1b
      	ori	2,2,0
      	ld	9,0(3)
      	andi.	10,9,0x80
      	beqlr
      	li	4,0
      	b	wake_up_page_bit
      
      To this:
      
      	li	10,1
      	lwsync
      1:	ldarx	9,0,3,0
      	andc	9,9,10
      	stdcx.	9,0,3
      	bne-	1b
      	andi.	10,9,0x80
      	beqlr
      	li	4,0
      	b	wake_up_page_bit
      
      In a test of elapsed time for dd writing into 16GB of already-dirty
      pagecache on a POWER8 with 4K pages, which has one unlock_page per 4kB
      this patch reduced overhead by 1.1%:
      
          N           Min           Max        Median           Avg        Stddev
      x  19         2.578         2.619         2.594         2.595         0.011
      +  19         2.552         2.592         2.564         2.565         0.008
      Difference at 95.0% confidence
      	-0.030  +/- 0.006
      	-1.142% +/- 0.243%
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Made 64-bit only until I can test it properly on 32-bit]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d11914b2
    • P
      KVM: PPC: Book3S HV: Disable HPT resizing on POWER9 for now · bcd3bb63
      Paul Mackerras 提交于
      The new HPT resizing code added in commit b5baa687 ("KVM: PPC:
      Book3S HV: KVM-HV HPT resizing implementation", 2016-12-20) doesn't
      have code to handle the new HPTE format which POWER9 uses.  Thus it
      would be best not to advertise it to userspace on POWER9 systems
      until it works properly.
      
      Also, since resize_hpt_rehash_hpte() contains BUG_ON() calls that
      could be hit on POWER9, let's prevent it from being called on POWER9
      for now.
      Acked-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      bcd3bb63
    • D
      bpf: make jited programs visible in traces · 74451e66
      Daniel Borkmann 提交于
      Long standing issue with JITed programs is that stack traces from
      function tracing check whether a given address is kernel code
      through {__,}kernel_text_address(), which checks for code in core
      kernel, modules and dynamically allocated ftrace trampolines. But
      what is still missing is BPF JITed programs (interpreted programs
      are not an issue as __bpf_prog_run() will be attributed to them),
      thus when a stack trace is triggered, the code walking the stack
      won't see any of the JITed ones. The same for address correlation
      done from user space via reading /proc/kallsyms. This is read by
      tools like perf, but the latter is also useful for permanent live
      tracing with eBPF itself in combination with stack maps when other
      eBPF types are part of the callchain. See offwaketime example on
      dumping stack from a map.
      
      This work tries to tackle that issue by making the addresses and
      symbols known to the kernel. The lookup from *kernel_text_address()
      is implemented through a latched RB tree that can be read under
      RCU in fast-path that is also shared for symbol/size/offset lookup
      for a specific given address in kallsyms. The slow-path iteration
      through all symbols in the seq file done via RCU list, which holds
      a tiny fraction of all exported ksyms, usually below 0.1 percent.
      Function symbols are exported as bpf_prog_<tag>, in order to aide
      debugging and attribution. This facility is currently enabled for
      root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening
      is active in any mode. The rationale behind this is that still a lot
      of systems ship with world read permissions on kallsyms thus addresses
      should not get suddenly exposed for them. If that situation gets
      much better in future, we always have the option to change the
      default on this. Likewise, unprivileged programs are not allowed
      to add entries there either, but that is less of a concern as most
      such programs types relevant in this context are for root-only anyway.
      If enabled, call graphs and stack traces will then show a correct
      attribution; one example is illustrated below, where the trace is
      now visible in tooling such as perf script --kallsyms=/proc/kallsyms
      and friends.
      
      Before:
      
        7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux)
               f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so)
      
      After:
      
        7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux)
        [...]
        7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux)
               f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so)
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      74451e66
    • D
      bpf: remove stubs for cBPF from arch code · 9383191d
      Daniel Borkmann 提交于
      Remove the dummy bpf_jit_compile() stubs for eBPF JITs and make
      that a single __weak function in the core that can be overridden
      similarly to the eBPF one. Also remove stale pr_err() mentions
      of bpf_jit_compile.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9383191d
  9. 17 2月, 2017 4 次提交