1. 22 7月, 2018 5 次提交
  2. 21 7月, 2018 6 次提交
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · f1d66bf9
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2018-07-20
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix in BPF Makefile to detect llvm-objcopy in a more robust way which is
         needed for pahole's BTF converter and minor UAPI tweaks in BTF_INT_BITS()
         to shrink the mask before eventual UAPI freeze, from Martin.
      
      2) Fix a segfault in bpftool when prog pin id has no further arguments such
         as id value or file specified, from Taeung.
      
      3) Fix powerpc JIT handling of XADD which has jumps to exit path that would
         potentially bypass verifier expectations e.g. with subprog calls. Also add
         a test case to make sure XADD is not mangling src/dst register, from Daniel.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f1d66bf9
    • D
      tls: check RCV_SHUTDOWN in tls_wait_data · fcf4793e
      Doron Roberts-Kedes 提交于
      The current code does not check sk->sk_shutdown & RCV_SHUTDOWN.
      tls_sw_recvmsg may return a positive value in the case where bytes have
      already been copied when the socket is shutdown. sk->sk_err has been
      cleared, causing the tls_wait_data to hang forever on a subsequent
      invocation. Checking sk->sk_shutdown & RCV_SHUTDOWN, as in tcp_recvmsg,
      fixes this problem.
      
      Fixes: c46234eb ("tls: RX path for ktls")
      Acked-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDoron Roberts-Kedes <doronrk@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fcf4793e
    • D
      Merge branch 'tcp-fix-DCTCP-ECE-Ack-series' · f7a6eb1e
      David S. Miller 提交于
      Yuchung Cheng says:
      
      ====================
      fix DCTCP ECE Ack series
      
      This patch set address that the existing DCTCP implementation does not
      fully implement the ACK policy specified in the RFC. This improves
      the responsiveness of CE status change particularly on flows with
      small inflight.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f7a6eb1e
    • Y
      tcp: do not delay ACK in DCTCP upon CE status change · a0496ef2
      Yuchung Cheng 提交于
      Per DCTCP RFC8257 (Section 3.2) the ACK reflecting the CE status change
      has to be sent immediately so the sender can respond quickly:
      
      """ When receiving packets, the CE codepoint MUST be processed as follows:
      
         1.  If the CE codepoint is set and DCTCP.CE is false, set DCTCP.CE to
             true and send an immediate ACK.
      
         2.  If the CE codepoint is not set and DCTCP.CE is true, set DCTCP.CE
             to false and send an immediate ACK.
      """
      
      Previously DCTCP implementation may continue to delay the ACK. This
      patch fixes that to implement the RFC by forcing an immediate ACK.
      
      Tested with this packetdrill script provided by Larry Brakmo
      
      0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      0.000 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0
      0.000 bind(3, ..., ...) = 0
      0.000 listen(3, 1) = 0
      
      0.100 < [ect0] SEW 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
      0.100 > SE. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
      0.110 < [ect0] . 1:1(0) ack 1 win 257
      0.200 accept(3, ..., ...) = 4
         +0 setsockopt(4, SOL_SOCKET, SO_DEBUG, [1], 4) = 0
      
      0.200 < [ect0] . 1:1001(1000) ack 1 win 257
      0.200 > [ect01] . 1:1(0) ack 1001
      
      0.200 write(4, ..., 1) = 1
      0.200 > [ect01] P. 1:2(1) ack 1001
      
      0.200 < [ect0] . 1001:2001(1000) ack 2 win 257
      +0.005 < [ce] . 2001:3001(1000) ack 2 win 257
      
      +0.000 > [ect01] . 2:2(0) ack 2001
      // Previously the ACK below would be delayed by 40ms
      +0.000 > [ect01] E. 2:2(0) ack 3001
      
      +0.500 < F. 9501:9501(0) ack 4 win 257
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a0496ef2
    • Y
      tcp: do not cancel delay-AcK on DCTCP special ACK · 27cde44a
      Yuchung Cheng 提交于
      Currently when a DCTCP receiver delays an ACK and receive a
      data packet with a different CE mark from the previous one's, it
      sends two immediate ACKs acking previous and latest sequences
      respectly (for ECN accounting).
      
      Previously sending the first ACK may mark off the delayed ACK timer
      (tcp_event_ack_sent). This may subsequently prevent sending the
      second ACK to acknowledge the latest sequence (tcp_ack_snd_check).
      The culprit is that tcp_send_ack() assumes it always acknowleges
      the latest sequence, which is not true for the first special ACK.
      
      The fix is to not make the assumption in tcp_send_ack and check the
      actual ack sequence before cancelling the delayed ACK. Further it's
      safer to pass the ack sequence number as a local variable into
      tcp_send_ack routine, instead of intercepting tp->rcv_nxt to avoid
      future bugs like this.
      Reported-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      27cde44a
    • Y
      tcp: helpers to send special DCTCP ack · 2987babb
      Yuchung Cheng 提交于
      Refactor and create helpers to send the special ACK in DCTCP.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2987babb
  3. 20 7月, 2018 8 次提交
    • M
      bpf: Use option "help" in the llvm-objcopy test · 7c3e8b64
      Martin KaFai Lau 提交于
      I noticed the "--version" option of the llvm-objcopy command has recently
      disappeared from the master llvm branch.  It is currently used as a BTF
      support test in tools/testing/selftests/bpf/Makefile.
      
      This patch replaces it with "--help" which should be
      less error prone in the future.
      
      Fixes: c0fa1b6c ("bpf: btf: Add BTF tests")
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      7c3e8b64
    • M
      bpf: btf: Clean up BTF_INT_BITS() in uapi btf.h · 36fc3c8c
      Martin KaFai Lau 提交于
      This patch shrinks the BTF_INT_BITS() mask.  The current
      btf_int_check_meta() ensures the nr_bits of an integer
      cannot exceed 64.  Hence, it is mostly an uapi cleanup.
      
      The actual btf usage (i.e. seq_show()) is also modified
      to use u8 instead of u16.  The verification (e.g. btf_int_check_meta())
      path stays as is to deal with invalid BTF situation.
      
      Fixes: 69b693f0 ("bpf: btf: Introduce BPF Type Format (BTF)")
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      36fc3c8c
    • T
      tools/bpftool: Fix segfault case regarding 'pin' arguments · 759b94a0
      Taeung Song 提交于
      Arguments of 'pin' subcommand should be checked
      at the very beginning of do_pin_any().
      Otherwise segfault errors can occur when using
      'map pin' or 'prog pin' commands, so fix it.
      
        # bpftool prog pin id
        Segmentation fault
      
      Fixes: 71bb428f ("tools: bpf: add bpftool")
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reported-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NTaeung Song <treeze.taeung@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      759b94a0
    • Z
      net-next/hinic: fix a problem in hinic_xmit_frame() · f7482683
      Zhao Chen 提交于
      The calculation of "wqe_size" is not correct when the tx queue is busy in
      hinic_xmit_frame().
      
      When there are no free WQEs, the tx flow will unmap the skb buffer, then
      ring the doobell for the pending packets. But the "wqe_size" which used
      to calculate the doorbell address is not correct. The wqe size should be
      cleared to 0, otherwise, it will cause a doorbell error.
      
      This patch fixes the problem.
      Reported-by: NZhou Wang <wangzhou1@hisilicon.com>
      Signed-off-by: NZhao Chen <zhaochen6@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f7482683
    • T
      net/page_pool: Fix inconsistent lock state warning · 4905bd9a
      Tariq Toukan 提交于
      Fix the warning below by calling the ptr_ring_consume_bh,
      which uses spin_[un]lock_bh.
      
      [  179.064300] ================================
      [  179.069073] WARNING: inconsistent lock state
      [  179.073846] 4.18.0-rc2+ #18 Not tainted
      [  179.078133] --------------------------------
      [  179.082907] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
      [  179.089637] swapper/21/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
      [  179.095478] 00000000963d1995 (&(&r->consumer_lock)->rlock){+.?.}, at:
      __page_pool_empty_ring+0x61/0x100
      [  179.105988] {SOFTIRQ-ON-W} state was registered at:
      [  179.111443]   _raw_spin_lock+0x35/0x50
      [  179.115634]   __page_pool_empty_ring+0x61/0x100
      [  179.120699]   page_pool_destroy+0x32/0x50
      [  179.125204]   mlx5e_free_rq+0x38/0xc0 [mlx5_core]
      [  179.130471]   mlx5e_close_channel+0x20/0x120 [mlx5_core]
      [  179.136418]   mlx5e_close_channels+0x26/0x40 [mlx5_core]
      [  179.142364]   mlx5e_close_locked+0x44/0x50 [mlx5_core]
      [  179.148509]   mlx5e_close+0x42/0x60 [mlx5_core]
      [  179.153936]   __dev_close_many+0xb1/0x120
      [  179.158749]   dev_close_many+0xa2/0x170
      [  179.163364]   rollback_registered_many+0x148/0x460
      [  179.169047]   rollback_registered+0x56/0x90
      [  179.174043]   unregister_netdevice_queue+0x7e/0x100
      [  179.179816]   unregister_netdev+0x18/0x20
      [  179.184623]   mlx5e_remove+0x2a/0x50 [mlx5_core]
      [  179.190107]   mlx5_remove_device+0xe5/0x110 [mlx5_core]
      [  179.196274]   mlx5_unregister_interface+0x39/0x90 [mlx5_core]
      [  179.203028]   cleanup+0x5/0xbfc [mlx5_core]
      [  179.208031]   __x64_sys_delete_module+0x16b/0x240
      [  179.213640]   do_syscall_64+0x5a/0x210
      [  179.218151]   entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  179.224218] irq event stamp: 334398
      [  179.228438] hardirqs last  enabled at (334398): [<ffffffffa511d8b7>]
      rcu_process_callbacks+0x1c7/0x790
      [  179.239178] hardirqs last disabled at (334397): [<ffffffffa511d872>]
      rcu_process_callbacks+0x182/0x790
      [  179.249931] softirqs last  enabled at (334386): [<ffffffffa509732e>] irq_enter+0x5e/0x70
      [  179.259306] softirqs last disabled at (334387): [<ffffffffa509741c>] irq_exit+0xdc/0xf0
      [  179.268584]
      [  179.268584] other info that might help us debug this:
      [  179.276572]  Possible unsafe locking scenario:
      [  179.276572]
      [  179.283877]        CPU0
      [  179.286954]        ----
      [  179.290033]   lock(&(&r->consumer_lock)->rlock);
      [  179.295546]   <Interrupt>
      [  179.298830]     lock(&(&r->consumer_lock)->rlock);
      [  179.304550]
      [  179.304550]  *** DEADLOCK ***
      
      Fixes: ff7d6b27 ("page_pool: refurbish version of page_pool code")
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4905bd9a
    • A
      Merge branch 'ppc-fix' · bb392867
      Alexei Starovoitov 提交于
      Daniel Borkmann says:
      
      ====================
      This set adds a ppc64 JIT fix for xadd as well as a missing test
      case for verifying whether xadd messes with src/dst reg. Thanks!
      ====================
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      bb392867
    • D
      bpf: test case to check whether src/dst regs got mangled by xadd · fa47a16b
      Daniel Borkmann 提交于
      We currently do not have such a test case in test_verifier selftests
      but it's important to test under bpf_jit_enable=1 to make sure JIT
      implementations do not mistakenly mess with src/dst reg for xadd/{w,dw}.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      fa47a16b
    • D
      bpf, ppc64: fix unexpected r0=0 exit path inside bpf_xadd · b9c1e60e
      Daniel Borkmann 提交于
      None of the JITs is allowed to implement exit paths from the BPF
      insn mappings other than BPF_JMP | BPF_EXIT. In the BPF core code
      we have a couple of rewrites in eBPF (e.g. LD_ABS / LD_IND) and
      in eBPF to cBPF translation to retain old existing behavior where
      exceptions may occur; they are also tightly controlled by the
      verifier where it disallows some of the features such as BPF to
      BPF calls when legacy LD_ABS / LD_IND ops are present in the BPF
      program. During recent review of all BPF_XADD JIT implementations
      I noticed that the ppc64 one is buggy in that it contains two
      jumps to exit paths. This is problematic as this can bypass verifier
      expectations e.g. pointed out in commit f6b1b3bf ("bpf: fix
      subprog verifier bypass by div/mod by 0 exception"). The first
      exit path is obsoleted by the fix in ca369602 ("bpf: allow xadd
      only on aligned memory") anyway, and for the second one we need to
      do a fetch, add and store loop if the reservation from lwarx/ldarx
      was lost in the meantime.
      
      Fixes: 156d0e29 ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF")
      Reviewed-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Reviewed-by: NSandipan Das <sandipan@linux.vnet.ibm.com>
      Tested-by: NSandipan Das <sandipan@linux.vnet.ibm.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      b9c1e60e
  4. 19 7月, 2018 21 次提交