提交 · bd5f5f4ecb78e2698dad655645b6d6a2f7012a8c · openanolis / cloud-kernel

01 6月, 2017 1 次提交

bpf: free up BPF_JMP | BPF_CALL | BPF_X opcode · 71189fa9

由 Alexei Starovoitov 提交于 5月 30, 2017

free up BPF_JMP | BPF_CALL | BPF_X opcode to be used by actual
indirect call by register and use kernel internal opcode to
mark call instruction into bpf_tail_call() helper.
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

71189fa9

18 2月, 2017 2 次提交

bpf: make jited programs visible in traces · 74451e66

由 Daniel Borkmann 提交于 2月 16, 2017

Long standing issue with JITed programs is that stack traces from
function tracing check whether a given address is kernel code
through {__,}kernel_text_address(), which checks for code in core
kernel, modules and dynamically allocated ftrace trampolines. But
what is still missing is BPF JITed programs (interpreted programs
are not an issue as __bpf_prog_run() will be attributed to them),
thus when a stack trace is triggered, the code walking the stack
won't see any of the JITed ones. The same for address correlation
done from user space via reading /proc/kallsyms. This is read by
tools like perf, but the latter is also useful for permanent live
tracing with eBPF itself in combination with stack maps when other
eBPF types are part of the callchain. See offwaketime example on
dumping stack from a map.

This work tries to tackle that issue by making the addresses and
symbols known to the kernel. The lookup from *kernel_text_address()
is implemented through a latched RB tree that can be read under
RCU in fast-path that is also shared for symbol/size/offset lookup
for a specific given address in kallsyms. The slow-path iteration
through all symbols in the seq file done via RCU list, which holds
a tiny fraction of all exported ksyms, usually below 0.1 percent.
Function symbols are exported as bpf_prog_<tag>, in order to aide
debugging and attribution. This facility is currently enabled for
root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening
is active in any mode. The rationale behind this is that still a lot
of systems ship with world read permissions on kallsyms thus addresses
should not get suddenly exposed for them. If that situation gets
much better in future, we always have the option to change the
default on this. Likewise, unprivileged programs are not allowed
to add entries there either, but that is less of a concern as most
such programs types relevant in this context are for root-only anyway.
If enabled, call graphs and stack traces will then show a correct
attribution; one example is illustrated below, where the trace is
now visible in tooling such as perf script --kallsyms=/proc/kallsyms
and friends.

Before:

7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so)

After:

7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux)
[...]
7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so)
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

74451e66

bpf: remove stubs for cBPF from arch code · 9383191d

由 Daniel Borkmann 提交于 2月 16, 2017

Remove the dummy bpf_jit_compile() stubs for eBPF JITs and make
that a single __weak function in the core that can be overridden
similarly to the eBPF one. Also remove stale pr_err() mentions
of bpf_jit_compile.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9383191d

10 2月, 2017 1 次提交

powerpc/bpf: Introduce __PPC_SH64() · c233f597

由 Naveen N. Rao 提交于 2月 08, 2017

Introduce __PPC_SH64() as a 64-bit variant to encode shift field in some
of the shift and rotate instructions operating on double-words. Convert
some of the BPF instruction macros to use the same.
Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c233f597

25 1月, 2017 2 次提交

powerpc/bpf: Flush the entire JIT buffer · 10528b9c

由 Naveen N. Rao 提交于 1月 13, 2017

With bpf_jit_binary_alloc(), we allocate at a page granularity and fill
the rest of the space with illegal instructions to mitigate BPF spraying
attacks, while having the actual JIT'ed BPF program at a random location
within the allocated space. Under this scenario, it would be better to
flush the entire allocated buffer rather than just the part containing
the actual program. We already flush the buffer from start to the end of
the BPF program. Extend this to include the illegal instructions after
the BPF program.
Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

10528b9c

powerpc/bpf: Remove redundant check for non-null image · 052de33c

由 Daniel Borkmann 提交于 1月 13, 2017

We have a check earlier to ensure we don't proceed if image is NULL. As
such, the redundant check can be removed.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
[Added similar changes for classic BPF JIT]
Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

052de33c

09 12月, 2016 1 次提交

bpf: xdp: Allow head adjustment in XDP prog · 17bedab2

由 Martin KaFai Lau 提交于 12月 07, 2016

This patch allows XDP prog to extend/remove the packet
data at the head (like adding or removing header).  It is
done by adding a new XDP helper bpf_xdp_adjust_head().

It also renames bpf_helper_changes_skb_data() to
bpf_helper_changes_pkt_data() to better reflect
that XDP prog does not work on skb.

This patch adds one "xdp_adjust_head" bit to bpf_prog for the
XDP-capable driver to check if the XDP prog requires
bpf_xdp_adjust_head() support.  The driver can then decide
to error out during XDP_SETUP_PROG.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

17bedab2

04 10月, 2016 3 次提交

powerpc/bpf: Add support for bpf constant blinding · b7b7013c

由 Naveen N. Rao 提交于 9月 24, 2016

In line with similar support for other architectures by Daniel Borkmann.

'MOD Default X' from test_bpf without constant blinding:
84 bytes emitted from JIT compiler (pass:3, flen:7)
d0000000058a4688 + <x>:
   0:	nop
   4:	nop
   8:	std     r27,-40(r1)
   c:	std     r28,-32(r1)
  10:	xor     r8,r8,r8
  14:	xor     r28,r28,r28
  18:	mr      r27,r3
  1c:	li      r8,66
  20:	cmpwi   r28,0
  24:	bne     0x0000000000000030
  28:	li      r8,0
  2c:	b       0x0000000000000044
  30:	divwu   r9,r8,r28
  34:	mullw   r9,r28,r9
  38:	subf    r8,r9,r8
  3c:	rotlwi  r8,r8,0
  40:	li      r8,66
  44:	ld      r27,-40(r1)
  48:	ld      r28,-32(r1)
  4c:	mr      r3,r8
  50:	blr

... and with constant blinding:
140 bytes emitted from JIT compiler (pass:3, flen:11)
d00000000bd6ab24 + <x>:
   0:	nop
   4:	nop
   8:	std     r27,-40(r1)
   c:	std     r28,-32(r1)
  10:	xor     r8,r8,r8
  14:	xor     r28,r28,r28
  18:	mr      r27,r3
  1c:	lis     r2,-22834
  20:	ori     r2,r2,36083
  24:	rotlwi  r2,r2,0
  28:	xori    r2,r2,36017
  2c:	xoris   r2,r2,42702
  30:	rotlwi  r2,r2,0
  34:	mr      r8,r2
  38:	rotlwi  r8,r8,0
  3c:	cmpwi   r28,0
  40:	bne     0x000000000000004c
  44:	li      r8,0
  48:	b       0x000000000000007c
  4c:	divwu   r9,r8,r28
  50:	mullw   r9,r28,r9
  54:	subf    r8,r9,r8
  58:	rotlwi  r8,r8,0
  5c:	lis     r2,-17137
  60:	ori     r2,r2,39065
  64:	rotlwi  r2,r2,0
  68:	xori    r2,r2,39131
  6c:	xoris   r2,r2,48399
  70:	rotlwi  r2,r2,0
  74:	mr      r8,r2
  78:	rotlwi  r8,r8,0
  7c:	ld      r27,-40(r1)
  80:	ld      r28,-32(r1)
  84:	mr      r3,r8
  88:	blr
Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b7b7013c

powerpc/bpf: Implement support for tail calls · ce076141

由 Naveen N. Rao 提交于 9月 24, 2016

Tail calls allow JIT'ed eBPF programs to call into other JIT'ed eBPF
programs. This can be achieved either by:
(1) retaining the stack setup by the first eBPF program and having all
subsequent eBPF programs re-using it, or,
(2) by unwinding/tearing down the stack and having each eBPF program
deal with its own stack as it sees fit.

To ensure that this does not create loops, there is a limit to how many
tail calls can be done (currently 32). This requires the JIT'ed code to
maintain a count of the number of tail calls done so far.

Approach (1) is simple, but requires every eBPF program to have (almost)
the same prologue/epilogue, regardless of whether they need it. This is
inefficient for small eBPF programs which may not sometimes need a
prologue at all. As such, to minimize impact of tail call
implementation, we use approach (2) here which needs each eBPF program
in the chain to use its own prologue/epilogue. This is not ideal when
many tail calls are involved and when all the eBPF programs in the chain
have similar prologue/epilogue. However, the impact is restricted to
programs that do tail calls. Individual eBPF programs are not affected.

We maintain the tail call count in a fixed location on the stack and
updated tail call count values are passed in through this. The very
first eBPF program in a chain sets this up to 0 (the first 2
instructions). Subsequent tail calls skip the first two eBPF JIT
instructions to maintain the count. For programs that don't do tail
calls themselves, the first two instructions are NOPs.
Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ce076141

powerpc/bpf: Introduce accessors for using the tmp local stack space · 7b847f52

由 Naveen N. Rao 提交于 9月 24, 2016

While at it, ensure that the location of the local save area is
consistent whether or not we setup our own stackframe. This property is
utilised in the next patch that adds support for tail calls.
Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7b847f52

24 6月, 2016 6 次提交

powerpc/ebpf/jit: Implement JIT compiler for extended BPF · 156d0e29

由 Naveen N. Rao 提交于 6月 22, 2016

PPC64 eBPF JIT compiler.

Enable with:
  echo 1 > /proc/sys/net/core/bpf_jit_enable
or
  echo 2 > /proc/sys/net/core/bpf_jit_enable

... to see the generated JIT code. This can further be processed with
tools/net/bpf_jit_disasm.

With CONFIG_TEST_BPF=m and 'modprobe test_bpf':

 test_bpf: Summary: 305 PASSED, 0 FAILED, [297/297 JIT'ed]

... on both ppc64 BE and LE.

The details of the approach are documented through various comments in
the code.
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

156d0e29

powerpc/bpf/jit: Isolate classic BPF JIT specifics into a separate header · 6ac0ba5a

由 Naveen N. Rao 提交于 6月 22, 2016

Break out classic BPF JIT specifics into a separate header in
preparation for eBPF JIT implementation. Note that ppc32 will still need
the classic BPF JIT.
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6ac0ba5a

powerpc/bpf/jit: A few cleanups · cef1e8cd

由 Naveen N. Rao 提交于 6月 22, 2016

1. Per the ISA, ADDIS actually uses RT, rather than RS. Though
   the result is the same, make the usage clear.
2. The multiply instruction used is a 32-bit multiply. Rename PPC_MUL()
   to PPC_MULW() to make the same clear.
3. PPC_STW[U] take the entire 16-bit immediate value and do not require
   word-alignment, per the ISA. Change the macros to use IMM_L().
4. A few white-space cleanups to satisfy checkpatch.pl.
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

cef1e8cd

powerpc/bpf/jit: Introduce rotate immediate instructions · 277285b8

由 Naveen N. Rao 提交于 6月 22, 2016

Since we will be using the rotate immediate instructions for extended
BPF JIT, let's introduce macros for the same. And since the shift
immediate operations use the rotate immediate instructions, let's redo
those macros to use the newly introduced instructions.
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

277285b8

powerpc/bpf/jit: Optimize 64-bit Immediate loads · b1a05787

由 Naveen N. Rao 提交于 6月 22, 2016

Similar to the LI32() optimization, if the value can be represented
in 32-bits, use LI32(). Also handle loading a few specific forms of
immediate values in an optimum manner.
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b1a05787

powerpc/bpf/jit: Fix/enhance 32-bit Load Immediate implementation · aaf2f7e0

由 Naveen N. Rao 提交于 6月 22, 2016

The existing LI32() macro can sometimes result in a sign-extended 32-bit
load that does not clear the top 32-bits properly. As an example,
loading 0x7fffffff results in the register containing
0xffffffff7fffffff. While this does not impact classic BPF JIT
implementation (since that only uses the lower word for all operations),
we would like to share this macro between classic BPF JIT and extended
BPF JIT, wherein the entire 64-bit value in the register matters. Fix
this by first doing a shifted LI followed by ORI.

An additional optimization is with loading values between -32768 to -1,
where we now only need a single LI.

The new implementation now generates the same or less number of
instructions.
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

aaf2f7e0

06 1月, 2016 1 次提交

net: filter: make JITs zero A for SKF_AD_ALU_XOR_X · 55795ef5

由 Rabin Vincent 提交于 1月 05, 2016

The SKF_AD_ALU_XOR_X ancillary is not like the other ancillary data
instructions since it XORs A with X while all the others replace A with
some loaded value.  All the BPF JITs fail to clear A if this is used as
the first instruction in a filter.  This was found using american fuzzy
lop.

Add a helper to determine if A needs to be cleared given the first
instruction in a filter, and use this in the JITs.  Except for ARM, the
rest have only been compile-tested.

Fixes: 34805931 ("net: filter: get rid of BPF_S_* enum")
Signed-off-by: NRabin Vincent <rabin@rab.in>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

55795ef5

03 10月, 2015 1 次提交

ebpf: migrate bpf_prog's flags to bitfield · a91263d5

由 Daniel Borkmann 提交于 9月 30, 2015

As we need to add further flags to the bpf_prog structure, lets migrate
both bools to a bitfield representation. The size of the base structure
(excluding insns) remains unchanged at 40 bytes.

Add also tags for the kmemchecker, so that it doesn't throw false
positives. Even in case gcc would generate suboptimal code, it's not
being accessed in performance critical paths.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a91263d5

21 2月, 2015 3 次提交

ppc: bpf: Add SKF_AD_CPU for ppc32 · 02290948

由 Denis Kirjanov 提交于 2月 17, 2015

Signed-off-by: NDenis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

02290948

ppc: bpf: rename bpf_jit_64.S to bpf_jit_asm.S · 2ddadeab

由 Denis Kirjanov 提交于 2月 17, 2015

Signed-off-by: NDenis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2ddadeab

ppc: bpf: update jit to use compatibility macros · 09ca5ab2

由 Denis Kirjanov 提交于 2月 17, 2015

Use helpers from the asm-compat.h to wrap up assembly mnemonics
Signed-off-by: NDenis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

09ca5ab2

20 1月, 2015 1 次提交

module: remove mod arg from module_free, rename module_memfree(). · be1f221c

由 Rusty Russell 提交于 1月 20, 2015

Nothing needs the module pointer any more, and the next patch will
call it from RCU, where the module itself might no longer exist.
Removing the arg is the safest approach.

This just codifies the use of the module_alloc/module_free pattern
which ftrace and bpf use.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: x86@kernel.org
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: linux-cris-kernel@axis.com
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: nios2-dev@lists.rocketboards.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Cc: netdev@vger.kernel.org

be1f221c

19 11月, 2014 1 次提交

PPC: bpf_jit_comp: Unify BPF_MOD | BPF_X and BPF_DIV | BPF_X · cadaecd2

由 Denis Kirjanov 提交于 11月 17, 2014

Reduce duplicated code by unifying
BPF_ALU | BPF_MOD | BPF_X and BPF_ALU | BPF_DIV | BPF_X

CC: Alexei Starovoitov<alexei.starovoitov@gmail.com>
CC: Daniel Borkmann<dborkman@redhat.com>
CC: Philippe Bergheaud<felix@linux.vnet.ibm.com>
Signed-off-by: NDenis Kirjanov <kda@linux-powerpc.org>
Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cadaecd2

12 11月, 2014 1 次提交

PPC: bpf_jit_comp: add SKF_AD_HATYPE instruction · 5b61c4db

由 Denis Kirjanov 提交于 11月 10, 2014

Add BPF extension SKF_AD_HATYPE to ppc JIT to check
the hw type of the interface

Before:
[   57.723666] test_bpf: #20 LD_HATYPE
[   57.723675] BPF filter opcode 0020 (@0) unsupported
[   57.724168] 48 48 PASS

After:
[  103.053184] test_bpf: #20 LD_HATYPE 7 6 PASS

CC: Alexei Starovoitov<alexei.starovoitov@gmail.com>
CC: Daniel Borkmann<dborkman@redhat.com>
CC: Philippe Bergheaud<felix@linux.vnet.ibm.com>
Signed-off-by: NDenis Kirjanov <kda@linux-powerpc.org>

v2: address Alexei's comments
Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5b61c4db

04 11月, 2014 1 次提交

PPC: bpf_jit_comp: add SKF_AD_PKTTYPE instruction · 4e235761

由 Denis Kirjanov 提交于 10月 30, 2014

Add BPF extension SKF_AD_PKTTYPE to ppc JIT to load
skb->pkt_type field.

Before:
[   88.262622] test_bpf: #11 LD_IND_NET 86 97 99 PASS
[   88.265740] test_bpf: #12 LD_PKTTYPE 109 107 PASS

After:
[   80.605964] test_bpf: #11 LD_IND_NET 44 40 39 PASS
[   80.607370] test_bpf: #12 LD_PKTTYPE 9 9 PASS

CC: Alexei Starovoitov<alexei.starovoitov@gmail.com>
CC: Michael Ellerman<mpe@ellerman.id.au>
Cc: Matt Evans <matt@ozlabs.org>
Signed-off-by: NDenis Kirjanov <kda@linux-powerpc.org>

v2: Added test rusults
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4e235761

10 9月, 2014 1 次提交

net: bpf: be friendly to kmemcheck · 286aad3c

由 Daniel Borkmann 提交于 9月 08, 2014

Reported by Mikulas Patocka, kmemcheck currently barks out a
false positive since we don't have special kmemcheck annotation
for bitfields used in bpf_prog structure.

We currently have jited:1, len:31 and thus when accessing len
while CONFIG_KMEMCHECK enabled, kmemcheck throws a warning that
we're reading uninitialized memory.

As we don't need the whole bit universe for pages member, we
can just split it to u16 and use a bool flag for jited instead
of a bitfield.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

286aad3c

06 9月, 2014 1 次提交

net: bpf: make eBPF interpreter images read-only · 60a3b225

由 Daniel Borkmann 提交于 9月 02, 2014

With eBPF getting more extended and exposure to user space is on it's way,
hardening the memory range the interpreter uses to steer its command flow
seems appropriate.  This patch moves the to be interpreted bytecode to
read-only pages.

In case we execute a corrupted BPF interpreter image for some reason e.g.
caused by an attacker which got past a verifier stage, it would not only
provide arbitrary read/write memory access but arbitrary function calls
as well. After setting up the BPF interpreter image, its contents do not
change until destruction time, thus we can setup the image on immutable
made pages in order to mitigate modifications to that code. The idea
is derived from commit 314beb9b ("x86: bpf_jit_comp: secure bpf jit
against spraying attacks").

This is possible because bpf_prog is not part of sk_filter anymore.
After setup bpf_prog cannot be altered during its life-time. This prevents
any modifications to the entire bpf_prog structure (incl. function/JIT
image pointer).

Every eBPF program (including classic BPF that are migrated) have to call
bpf_prog_select_runtime() to select either interpreter or a JIT image
as a last setup step, and they all are being freed via bpf_prog_free(),
including non-JIT. Therefore, we can easily integrate this into the
eBPF life-time, plus since we directly allocate a bpf_prog, we have no
performance penalty.

Tested with seccomp and test_bpf testsuite in JIT/non-JIT mode and manual
inspection of kernel_page_tables.  Brad Spengler proposed the same idea
via Twitter during development of this patch.

Joint work with Hannes Frederic Sowa.
Suggested-by: NBrad Spengler <spender@grsecurity.net>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Kees Cook <keescook@chromium.org>
Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

60a3b225

03 8月, 2014 1 次提交

net: filter: split 'struct sk_filter' into socket and bpf parts · 7ae457c1

由 Alexei Starovoitov 提交于 7月 30, 2014

clean up names related to socket filtering and bpf in the following way:
- everything that deals with sockets keeps 'sk_*' prefix
- everything that is pure BPF is changed to 'bpf_*' prefix

split 'struct sk_filter' into
struct sk_filter {
	atomic_t        refcnt;
	struct rcu_head rcu;
	struct bpf_prog *prog;
};
and
struct bpf_prog {
        u32                     jited:1,
                                len:31;
        struct sock_fprog_kern  *orig_prog;
        unsigned int            (*bpf_func)(const struct sk_buff *skb,
                                            const struct bpf_insn *filter);
        union {
                struct sock_filter      insns[0];
                struct bpf_insn         insnsi[0];
                struct work_struct      work;
        };
};
so that 'struct bpf_prog' can be used independent of sockets and cleans up
'unattached' bpf use cases

split SK_RUN_FILTER macro into:
    SK_RUN_FILTER to be used with 'struct sk_filter *' and
    BPF_PROG_RUN to be used with 'struct bpf_prog *'

__sk_filter_release(struct sk_filter *) gains
__bpf_prog_release(struct bpf_prog *) helper function

also perform related renames for the functions that work
with 'struct bpf_prog *', since they're on the same lines:

sk_filter_size -> bpf_prog_size
sk_filter_select_runtime -> bpf_prog_select_runtime
sk_filter_free -> bpf_prog_free
sk_unattached_filter_create -> bpf_prog_create
sk_unattached_filter_destroy -> bpf_prog_destroy
sk_store_orig_filter -> bpf_prog_store_orig_filter
sk_release_orig_filter -> bpf_release_orig_filter
__sk_migrate_filter -> bpf_migrate_filter
__sk_prepare_filter -> bpf_prepare_filter

API for attaching classic BPF to a socket stays the same:
sk_attach_filter(prog, struct sock *)/sk_detach_filter(struct sock *)
and SK_RUN_FILTER(struct sk_filter *, ctx) to execute a program
which is used by sockets, tun, af_packet

API for 'unattached' BPF programs becomes:
bpf_prog_create(struct bpf_prog **)/bpf_prog_destroy(struct bpf_prog *)
and BPF_PROG_RUN(struct bpf_prog *, ctx) to execute a program
which is used by isdn, ppp, team, seccomp, ptp, xt_bpf, cls_bpf, test_bpf
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7ae457c1

28 6月, 2014 2 次提交

powerpc: bpf: Fix the broken LD_VLAN_TAG_PRESENT test · dba63115

由 Denis Kirjanov 提交于 6月 25, 2014

We have to return the boolean here if the tag presents
or not, not just ANDing the TCI with the mask which results to:

[  709.412097] test_bpf: #18 LD_VLAN_TAG_PRESENT
[  709.412245] ret 4096 != 1
[  709.412332] ret 4096 != 1
[  709.412333] FAIL (2 times)
Signed-off-by: NDenis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dba63115

powerpc: bpf: Use correct mask while accessing the VLAN tag · 3fc60aa0

由 Denis Kirjanov 提交于 6月 25, 2014

To get a full tag (and not just a VID) we should access the TCI
except the VLAN_TAG_PRESENT field (which means that 802.1q header
is present). Also ensure that the VLAN_TAG_PRESENT stay on its place
Signed-off-by: NDenis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3fc60aa0

02 6月, 2014 1 次提交

net: filter: get rid of BPF_S_* enum · 34805931

由 Daniel Borkmann 提交于 5月 29, 2014

This patch finally allows us to get rid of the BPF_S_* enum.
Currently, the code performs unnecessary encode and decode
workarounds in seccomp and filter migration itself when a filter
is being attached in order to overcome BPF_S_* encoding which
is not used anymore by the new interpreter resp. JIT compilers.

Keeping it around would mean that also in future we would need
to extend and maintain this enum and related encoders/decoders.
We can get rid of all that and save us these operations during
filter attaching. Naturally, also JIT compilers need to be updated
by this.

Before JIT conversion is being done, each compiler checks if A
is being loaded at startup to obtain information if it needs to
emit instructions to clear A first. Since BPF extensions are a
subset of BPF_LD | BPF_{W,H,B} | BPF_ABS variants, case statements
for extensions can be removed at that point. To ease and minimalize
code changes in the classic JITs, we have introduced bpf_anc_helper().

Tested with test_bpf on x86_64 (JIT, int), s390x (JIT, int),
arm (JIT, int), i368 (int), ppc64 (JIT, int); for sparc we
unfortunately didn't have access, but changes are analogous to
the rest.

Joint work with Alexei Starovoitov.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mircea Gherzan <mgherzan@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Acked-by: NChema Gonzalez <chemag@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

34805931

31 3月, 2014 1 次提交

net: filter: add jited flag to indicate jit compiled filters · f8bbbfc3

由 Daniel Borkmann 提交于 3月 28, 2014

This patch adds a jited flag into sk_filter struct in order to indicate
whether a filter is currently jited or not. The size of sk_filter is
not being expanded as the 32 bit 'len' member allows upper bits to be
reused since a filter can currently only grow as large as BPF_MAXINSNS.

Therefore, there's enough room also for other in future needed flags to
reuse 'len' field if necessary. The jited flag also allows for having
alternative interpreter functions running as currently, we can only
detect jit compiled filters by testing fp->bpf_func to not equal the
address of sk_run_filter().

Joint work with Alexei Starovoitov.
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f8bbbfc3

27 3月, 2014 1 次提交

net: Rename skb->rxhash to skb->hash · 61b905da

由 Tom Herbert 提交于 3月 24, 2014

The packet hash can be considered a property of the packet, not just
on RX path.

This patch changes name of rxhash and l4_rxhash skbuff fields to be
hash and l4_hash respectively. This includes changing uses of the
field in the code which don't call the access functions.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

61b905da

16 1月, 2014 1 次提交

bpf: do not use reciprocal divide · aee636c4

由 Eric Dumazet 提交于 1月 15, 2014

At first Jakub Zawadzki noticed that some divisions by reciprocal_divide
were not correct. (off by one in some cases)
http://www.wireshark.org/~darkjames/reciprocal-buggy.c

He could also show this with BPF:
http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c

The reciprocal divide in linux kernel is not generic enough,
lets remove its use in BPF, as it is not worth the pain with
current cpus.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NJakub Zawadzki <darkjames-ws@darkjames.pl>
Cc: Mircea Gherzan <mgherzan@gmail.com>
Cc: Daniel Borkmann <dxchgb@gmail.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Matt Evans <matt@ozlabs.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aee636c4

31 10月, 2013 2 次提交

powerpc/bpf: Support MOD operation · b0c06d33

由 Vladimir Murzin 提交于 9月 28, 2013

commit b6069a95 (filter: add MOD operation) added generic
support for modulus operation in BPF.

This patch brings JIT support for PPC64
Signed-off-by: NVladimir Murzin <murzin.v@gmail.com>
Acked-by: NMatt Evans <matt@ozlabs.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

b0c06d33

powerpc/bpf: BPF JIT compiler for 64-bit Little Endian · 9c662cad

由 Philippe Bergheaud 提交于 9月 24, 2013

This enables the Berkeley Packet Filter JIT compiler
for the PowerPC running in 64bit Little Endian.
Signed-off-by: NPhilippe Bergheaud <felix@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

9c662cad

08 10月, 2013 1 次提交

net: fix unsafe set_memory_rw from softirq · d45ed4a4

由 Alexei Starovoitov 提交于 10月 04, 2013

on x86 system with net.core.bpf_jit_enable = 1

sudo tcpdump -i eth1 'tcp port 22'

causes the warning:
[   56.766097]  Possible unsafe locking scenario:
[   56.766097]
[   56.780146]        CPU0
[   56.786807]        ----
[   56.793188]   lock(&(&vb->lock)->rlock);
[   56.799593]   <Interrupt>
[   56.805889]     lock(&(&vb->lock)->rlock);
[   56.812266]
[   56.812266]  *** DEADLOCK ***
[   56.812266]
[   56.830670] 1 lock held by ksoftirqd/1/13:
[   56.836838]  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff8118f44c>] vm_unmap_aliases+0x8c/0x380
[   56.849757]
[   56.849757] stack backtrace:
[   56.862194] CPU: 1 PID: 13 Comm: ksoftirqd/1 Not tainted 3.12.0-rc3+ #45
[   56.868721] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
[   56.882004]  ffffffff821944c0 ffff88080bbdb8c8 ffffffff8175a145 0000000000000007
[   56.895630]  ffff88080bbd5f40 ffff88080bbdb928 ffffffff81755b14 0000000000000001
[   56.909313]  ffff880800000001 ffff880800000000 ffffffff8101178f 0000000000000001
[   56.923006] Call Trace:
[   56.929532]  [<ffffffff8175a145>] dump_stack+0x55/0x76
[   56.936067]  [<ffffffff81755b14>] print_usage_bug+0x1f7/0x208
[   56.942445]  [<ffffffff8101178f>] ? save_stack_trace+0x2f/0x50
[   56.948932]  [<ffffffff810cc0a0>] ? check_usage_backwards+0x150/0x150
[   56.955470]  [<ffffffff810ccb52>] mark_lock+0x282/0x2c0
[   56.961945]  [<ffffffff810ccfed>] __lock_acquire+0x45d/0x1d50
[   56.968474]  [<ffffffff810cce6e>] ? __lock_acquire+0x2de/0x1d50
[   56.975140]  [<ffffffff81393bf5>] ? cpumask_next_and+0x55/0x90
[   56.981942]  [<ffffffff810cef72>] lock_acquire+0x92/0x1d0
[   56.988745]  [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380
[   56.995619]  [<ffffffff817628f1>] _raw_spin_lock+0x41/0x50
[   57.002493]  [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380
[   57.009447]  [<ffffffff8118f52a>] vm_unmap_aliases+0x16a/0x380
[   57.016477]  [<ffffffff8118f44c>] ? vm_unmap_aliases+0x8c/0x380
[   57.023607]  [<ffffffff810436b0>] change_page_attr_set_clr+0xc0/0x460
[   57.030818]  [<ffffffff810cfb8d>] ? trace_hardirqs_on+0xd/0x10
[   57.037896]  [<ffffffff811a8330>] ? kmem_cache_free+0xb0/0x2b0
[   57.044789]  [<ffffffff811b59c3>] ? free_object_rcu+0x93/0xa0
[   57.051720]  [<ffffffff81043d9f>] set_memory_rw+0x2f/0x40
[   57.058727]  [<ffffffff8104e17c>] bpf_jit_free+0x2c/0x40
[   57.065577]  [<ffffffff81642cba>] sk_filter_release_rcu+0x1a/0x30
[   57.072338]  [<ffffffff811108e2>] rcu_process_callbacks+0x202/0x7c0
[   57.078962]  [<ffffffff81057f17>] __do_softirq+0xf7/0x3f0
[   57.085373]  [<ffffffff81058245>] run_ksoftirqd+0x35/0x70

cannot reuse jited filter memory, since it's readonly,
so use original bpf insns memory to hold work_struct

defer kfree of sk_filter until jit completed freeing

tested on x86_64 and i386
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d45ed4a4

21 5月, 2013 1 次提交

ppc: bpf_jit: can call module_free() from any context · ed900ffb

由 Daniel Borkmann 提交于 5月 20, 2013

Followup patch on module_free()/vfree() that takes care of the rest, so
no longer this workaround with work_struct is needed.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Matt Evans <matt@ozlabs.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ed900ffb

22 3月, 2013 1 次提交

filter: bpf_jit_comp: refactor and unify BPF JIT image dump output · 79617801

由 Daniel Borkmann 提交于 3月 21, 2013

If bpf_jit_enable > 1, then we dump the emitted JIT compiled image
after creation. Currently, only SPARC and PowerPC has similar output
as in the reference implementation on x86_64. Make a small helper
function in order to reduce duplicated code and make the dump output
uniform across architectures x86_64, SPARC, PPC, ARM (e.g. on ARM
flen, pass and proglen are currently not shown, but would be
interesting to know as well), also for future BPF JIT implementations
on other archs.

Cc: Mircea Gherzan <mgherzan@gmail.com>
Cc: Matt Evans <matt@ozlabs.org>
Cc: Eric Dumazet <eric.dumazet@google.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79617801

18 11月, 2012 1 次提交

PPC: net: bpf_jit_comp: add VLAN instructions for BPF JIT · 5082dfb7

由 Daniel Borkmann 提交于 11月 08, 2012

This patch is a follow-up for patch "net: filter: add vlan tag access"
to support the new VLAN_TAG/VLAN_TAG_PRESENT accessors in BPF JIT.
Signed-off-by: NDaniel Borkmann <daniel.borkmann@tik.ee.ethz.ch>
Cc: Matt Evans <matt@ozlabs.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: NMatt Evans <matt@ozlabs.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5082dfb7

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功