提交 · 0a87bc2e58a687fe14817b9c7f73e68570ba33c6 · openeuler / Kernel

14 11月, 2017 23 次提交

Merge branch 'bpf-improve-verifier-ARG_CONST_SIZE_OR_ZERO-semantics' · 0a87bc2e

由 David S. Miller 提交于 11月 14, 2017

Yonghong Song says:

====================
bpf: improve verifier ARG_CONST_SIZE_OR_ZERO semantics

This patch set intends to change verifier ARG_CONST_SIZE_OR_ZERO
semantics so that simpler bpf programs can be written with verifier
acceptance. Patch #1 comment provided the detailed examples and
the patch itself implements the new semantics. Patch #2
changes bpf_probe_read helper arg2 type from
ARG_CONST_SIZE to ARG_CONST_SIZE_OR_ZERO. Patch #3 fixed a few
test cases and added some for better coverage.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0a87bc2e

bpf: fix and add test cases for ARG_CONST_SIZE_OR_ZERO semantics change · b6ff6391

由 Yonghong Song 提交于 11月 12, 2017

Fix a few test cases to allow non-NULL map/packet/stack pointer
with size = 0. Change a few tests using bpf_probe_read to use
bpf_probe_write_user so ARG_CONST_SIZE arg can still be properly
tested. One existing test case already covers size = 0 with non-NULL
packet pointer, so add additional tests so all cases of
size = 0 and 0 <= size <= legal_upper_bound with non-NULL
map/packet/stack pointer are covered.
Signed-off-by: NYonghong Song <yhs@fb.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b6ff6391

bpf: change helper bpf_probe_read arg2 type to ARG_CONST_SIZE_OR_ZERO · 9c019e2b

由 Yonghong Song 提交于 11月 12, 2017

The helper bpf_probe_read arg2 type is changed
from ARG_CONST_SIZE to ARG_CONST_SIZE_OR_ZERO to permit
size-0 buffer. Together with newer ARG_CONST_SIZE_OR_ZERO
semantics which allows non-NULL buffer with size 0,
this allows simpler bpf programs with verifier acceptance.
The previous commit which changes ARG_CONST_SIZE_OR_ZERO semantics
has details on examples.
Signed-off-by: NYonghong Song <yhs@fb.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9c019e2b

bpf: improve verifier ARG_CONST_SIZE_OR_ZERO semantics · 9fd29c08

由 Yonghong Song 提交于 11月 12, 2017

For helpers, the argument type ARG_CONST_SIZE_OR_ZERO permits the
access size to be 0 when accessing the previous argument (arg).
Right now, it requires the arg needs to be NULL when size passed
is 0 or could be 0. It also requires a non-NULL arg when the size
is proved to be non-0.

This patch changes verifier ARG_CONST_SIZE_OR_ZERO behavior
such that for size-0 or possible size-0, it is not required
the arg equal to NULL.

There are a couple of reasons for this semantics change, and
all of them intends to simplify user bpf programs which
may improve user experience and/or increase chances of
verifier acceptance. Together with the next patch which
changes bpf_probe_read arg2 type from ARG_CONST_SIZE to
ARG_CONST_SIZE_OR_ZERO, the following two examples, which
fail the verifier currently, are able to get verifier acceptance.

Example 1:
   unsigned long len = pend - pstart;
   len = len > MAX_PAYLOAD_LEN ? MAX_PAYLOAD_LEN : len;
   len &= MAX_PAYLOAD_LEN;
   bpf_probe_read(data->payload, len, pstart);

It does not have test for "len > 0" and it failed the verifier.
Users may not be aware that they have to add this test.
Converting the bpf_probe_read helper to have
ARG_CONST_SIZE_OR_ZERO helps the above code get
verifier acceptance.

Example 2:
  Here is one example where llvm "messed up" the code and
  the verifier fails.

......
   unsigned long len = pend - pstart;
   if (len > 0 && len <= MAX_PAYLOAD_LEN)
     bpf_probe_read(data->payload, len, pstart);
......

The compiler generates the following code and verifier fails:
......
39: (79) r2 = *(u64 *)(r10 -16)
40: (1f) r2 -= r8
41: (bf) r1 = r2
42: (07) r1 += -1
43: (25) if r1 > 0xffe goto pc+3
  R0=inv(id=0) R1=inv(id=0,umax_value=4094,var_off=(0x0; 0xfff))
  R2=inv(id=0) R6=map_value(id=0,off=0,ks=4,vs=4095,imm=0) R7=inv(id=0)
  R8=inv(id=0) R9=inv0 R10=fp0
44: (bf) r1 = r6
45: (bf) r3 = r8
46: (85) call bpf_probe_read#45
R2 min value is negative, either use unsigned or 'var &= const'
......

The compiler optimization is correct. If r1 = 0,
r1 - 1 = 0xffffffffffffffff > 0xffe.  If r1 != 0, r1 - 1 will not wrap.
r1 > 0xffe at insn #43 can actually capture
both "r1 > 0" and "len <= MAX_PAYLOAD_LEN".
This however causes an issue in verifier as the value range of arg2
"r2" does not properly get refined and lead to verification failure.

Relaxing bpf_prog_read arg2 from ARG_CONST_SIZE to ARG_CONST_SIZE_OR_ZERO
allows the following simplied code:
   unsigned long len = pend - pstart;
   if (len <= MAX_PAYLOAD_LEN)
     bpf_probe_read(data->payload, len, pstart);

The llvm compiler will generate less complex code and the
verifier is able to verify that the program is okay.
Signed-off-by: NYonghong Song <yhs@fb.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9fd29c08

tcp: allow drivers to tweak TSQ logic · 3a9b76fd

由 Eric Dumazet 提交于 11月 11, 2017

I had many reports that TSQ logic breaks wifi aggregation.

Current logic is to allow up to 1 ms of bytes to be queued into qdisc
and drivers queues.

But Wifi aggregation needs a bigger budget to allow bigger rates to
be discovered by various TCP Congestion Controls algorithms.

This patch adds an extra socket field, allowing wifi drivers to select
another log scale to derive TCP Small Queue credit from current pacing
rate.

Initial value is 10, meaning that this patch does not change current
behavior.

We expect wifi drivers to set this field to smaller values (tests have
been done with values from 6 to 9)

They would have to use following template :

if (skb->sk && skb->sk->sk_pacing_shift != MY_PACING_SHIFT)
     skb->sk->sk_pacing_shift = MY_PACING_SHIFT;

Ref: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1670041Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Toke Høiland-Jørgensen <toke@toke.dk>
Cc: Kir Kolyshkin <kir@openvz.org>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3a9b76fd

Merge tag 'rxrpc-next-20171111' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 166c8818

由 David S. Miller 提交于 11月 14, 2017

David Howells says:

====================
rxrpc: Fixes

Here are some patches that fix some things in AF_RXRPC:

 (1) Prevent notifications from being passed to a kernel service for a call
     that it has ended.

 (2) Fix a null pointer deference that occurs under some circumstances when an
     ACK is generated.

 (3) Fix a number of things to do with call expiration.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

166c8818

bnx2x: fix slowpath null crash · 442866ff

由 Zhu Yanjun 提交于 11月 11, 2017

When "NETDEV WATCHDOG: em4 (bnx2x): transmit queue 2 timed out" occurs,
BNX2X_SP_RTNL_TX_TIMEOUT is set. In the function bnx2x_sp_rtnl_task,
bnx2x_nic_unload and bnx2x_nic_load are executed to shutdown and open
NIC. In the function bnx2x_nic_load, bnx2x_alloc_mem allocates dma
failure. The message "bnx2x: [bnx2x_alloc_mem:8399(em4)]Can't
allocate memory" pops out. The variable slowpath is set to NULL.
When shutdown the NIC, the function bnx2x_nic_unload is called. In
the function bnx2x_nic_unload, the following functions are executed.
bnx2x_chip_cleanup
    bnx2x_set_storm_rx_mode
        bnx2x_set_q_rx_mode
            bnx2x_set_q_rx_mode
                bnx2x_config_rx_mode
                    bnx2x_set_rx_mode_e2
In the function bnx2x_set_rx_mode_e2, the variable slowpath is operated.
Then the crash occurs.
To fix this crash, the variable slowpath is checked. And in the function
bnx2x_sp_rtnl_task, after dma memory allocation fails, another shutdown
and open NIC is executed.

CC: Joe Jin <joe.jin@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Acked-by: NAriel Elior <aelior@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

442866ff

Merge branch 'cxgb4-collect-LE-TCAM-and-SGE-queue-contexts' · 0c3ce16c

由 David S. Miller 提交于 11月 14, 2017

Rahul Lakkireddy says:

====================
cxgb4: collect LE-TCAM and SGE queue contexts

Collect hardware dumps via ethtool --get-dump facility.

Patch 1 collects LE-TCAM dump.

Patch 2 collects SGE queue context dumps.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c3ce16c

cxgb4: collect SGE queue context dump · 9e5c598c

由 Rahul Lakkireddy 提交于 11月 11, 2017

Collect SGE freelist queue and congestion manager contexts.
Signed-off-by: NRahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e5c598c

cxgb4: collect LE-TCAM dump · 03e98b91

由 Rahul Lakkireddy 提交于 11月 11, 2017

Signed-off-by: NRahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

03e98b91

vxlan: fix the issue that neigh proxy blocks all icmpv6 packets · 8bff3685

由 Xin Long 提交于 11月 11, 2017

Commit f1fb08f6 ("vxlan: fix ND proxy when skb doesn't have transport
header offset") removed icmp6_code and icmp6_type check before calling
neigh_reduce when doing neigh proxy.

It means all icmpv6 packets would be blocked by this, not only ns packet.
In Jianlin's env, even ping6 couldn't work through it.

This patch is to bring the icmp6_code and icmp6_type check back and also
removed the same check from neigh_reduce().

Fixes: f1fb08f6 ("vxlan: fix ND proxy when skb doesn't have transport header offset")
Reported-by: NJianlin Shi <jishi@redhat.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Reviewed-by: NVincent Bernat <vincent@bernat.im>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8bff3685

xfrm6_tunnel: exit_net cleanup check added · baeb0dbb