1. 18 2月, 2017 2 次提交
    • D
      bpf: make jited programs visible in traces · 74451e66
      Daniel Borkmann 提交于
      Long standing issue with JITed programs is that stack traces from
      function tracing check whether a given address is kernel code
      through {__,}kernel_text_address(), which checks for code in core
      kernel, modules and dynamically allocated ftrace trampolines. But
      what is still missing is BPF JITed programs (interpreted programs
      are not an issue as __bpf_prog_run() will be attributed to them),
      thus when a stack trace is triggered, the code walking the stack
      won't see any of the JITed ones. The same for address correlation
      done from user space via reading /proc/kallsyms. This is read by
      tools like perf, but the latter is also useful for permanent live
      tracing with eBPF itself in combination with stack maps when other
      eBPF types are part of the callchain. See offwaketime example on
      dumping stack from a map.
      
      This work tries to tackle that issue by making the addresses and
      symbols known to the kernel. The lookup from *kernel_text_address()
      is implemented through a latched RB tree that can be read under
      RCU in fast-path that is also shared for symbol/size/offset lookup
      for a specific given address in kallsyms. The slow-path iteration
      through all symbols in the seq file done via RCU list, which holds
      a tiny fraction of all exported ksyms, usually below 0.1 percent.
      Function symbols are exported as bpf_prog_<tag>, in order to aide
      debugging and attribution. This facility is currently enabled for
      root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening
      is active in any mode. The rationale behind this is that still a lot
      of systems ship with world read permissions on kallsyms thus addresses
      should not get suddenly exposed for them. If that situation gets
      much better in future, we always have the option to change the
      default on this. Likewise, unprivileged programs are not allowed
      to add entries there either, but that is less of a concern as most
      such programs types relevant in this context are for root-only anyway.
      If enabled, call graphs and stack traces will then show a correct
      attribution; one example is illustrated below, where the trace is
      now visible in tooling such as perf script --kallsyms=/proc/kallsyms
      and friends.
      
      Before:
      
        7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux)
               f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so)
      
      After:
      
        7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux)
        [...]
        7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux)
               f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so)
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      74451e66
    • D
      bpf: remove stubs for cBPF from arch code · 9383191d
      Daniel Borkmann 提交于
      Remove the dummy bpf_jit_compile() stubs for eBPF JITs and make
      that a single __weak function in the core that can be overridden
      similarly to the eBPF one. Also remove stale pr_err() mentions
      of bpf_jit_compile.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9383191d
  2. 24 1月, 2017 2 次提交
  3. 20 1月, 2017 1 次提交
    • C
      KVM: s390: do not expose random data via facility bitmap · 04478197
      Christian Borntraeger 提交于
      kvm_s390_get_machine() populates the facility bitmap by copying bytes
      from the host results that are stored in a 256 byte array in the prefix
      page. The KVM code does use the size of the target buffer (2k), thus
      copying and exposing unrelated kernel memory (mostly machine check
      related logout data).
      
      Let's use the size of the source buffer instead.  This is ok, as the
      target buffer will always be greater or equal than the source buffer as
      the KVM internal buffers (and thus S390_ARCH_FAC_LIST_SIZE_BYTE) cover
      the maximum possible size that is allowed by STFLE, which is 256
      doublewords. All structures are zero allocated so we can leave bytes
      256-2047 unchanged.
      
      Add a similar fix for kvm_arch_init_vm().
      Reported-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      [found with smatch]
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      CC: stable@vger.kernel.org
      Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      04478197
  4. 16 1月, 2017 2 次提交
  5. 26 12月, 2016 1 次提交
    • T
      ktime: Cleanup ktime_set() usage · 8b0e1953
      Thomas Gleixner 提交于
      ktime_set(S,N) was required for the timespec storage type and is still
      useful for situations where a Seconds and Nanoseconds part of a time value
      needs to be converted. For anything where the Seconds argument is 0, this
      is pointless and can be replaced with a simple assignment.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      8b0e1953
  6. 25 12月, 2016 3 次提交
  7. 20 12月, 2016 2 次提交
    • H
      s390/kbuild: enable modversions for symbols exported from asm · cabab3f9
      Heiko Carstens 提交于
      s390 version of commit 334bb773 ("x86/kbuild: enable modversions
      for symbols exported from asm") so we get also rid of all these
      warnings:
      
      WARNING: EXPORT symbol "_mcount" [vmlinux] version generation failed, symbol will not be versioned.
      WARNING: EXPORT symbol "memcpy" [vmlinux] version generation failed, symbol will not be versioned.
      WARNING: EXPORT symbol "memmove" [vmlinux] version generation failed, symbol will not be versioned.
      WARNING: EXPORT symbol "memset" [vmlinux] version generation failed, symbol will not be versioned.
      WARNING: EXPORT symbol "save_fpu_regs" [vmlinux] version generation failed, symbol will not be versioned.
      WARNING: EXPORT symbol "sie64a" [vmlinux] version generation failed, symbol will not be versioned.
      WARNING: EXPORT symbol "sie_exit" [vmlinux] version generation failed, symbol will not be versioned.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      cabab3f9
    • M
      s390/vtime: correct system time accounting · 8f2b468a
      Martin Schwidefsky 提交于
      There is a slight misaccounting of system time in vtime_account_user.
      This function is called once per HZ tick in interrupt context.
      The irq_enter function already accounted the system time up to the
      point of the irq_enter call. The system time from irq_enter until
      vtime_account_user/do_account_vtime is reached is irq time but it
      is accounted to the previous context.
      
      Just drop the hardirq offset from arch/s390/kernel/vtime.c.
      Reported-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      8f2b468a
  8. 14 12月, 2016 10 次提交
  9. 13 12月, 2016 4 次提交
  10. 12 12月, 2016 8 次提交
  11. 09 12月, 2016 1 次提交
  12. 07 12月, 2016 4 次提交