1. 31 5月, 2020 8 次提交
    • C
      net: qrtr: Allocate workqueue before kernel_bind · c6e08d62
      Chris Lew 提交于
      A null pointer dereference in qrtr_ns_data_ready() is seen if a client
      opens a qrtr socket before qrtr_ns_init() can bind to the control port.
      When the control port is bound, the ENETRESET error will be broadcasted
      and clients will close their sockets. This results in DEL_CLIENT
      packets being sent to the ns and qrtr_ns_data_ready() being called
      without the workqueue being allocated.
      
      Allocate the workqueue before setting sk_data_ready and binding to the
      control port. This ensures that the work and workqueue structs are
      allocated and initialized before qrtr_ns_data_ready can be called.
      
      Fixes: 0c2204a4 ("net: qrtr: Migrate nameservice to kernel from userspace")
      Signed-off-by: NChris Lew <clew@codeaurora.org>
      Reviewed-by: NBjorn Andersson <bjorn.andersson@linaro.org>
      Reviewed-by: NManivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c6e08d62
    • D
      Merge branch 'mptcp-a-bunch-of-fixes' · e237659c
      David S. Miller 提交于
      Paolo Abeni says:
      
      ====================
      mptcp: a bunch of fixes
      
      This patch series pulls together a few bugfixes for MPTCP bug observed while
      doing stress-test with apache bench - forced to use MPTCP and multiple
      subflows.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e237659c
    • P
      mptcp: remove msk from the token container at destruction time. · c5c79763
      Paolo Abeni 提交于
      Currently we remote the msk from the token container only
      via mptcp_close(). The MPTCP master socket can be destroyed
      also via other paths (e.g. if not yet accepted, when shutting
      down the listener socket). When we hit the latter scenario,
      dangling msk references are left into the token container,
      leading to memory corruption and/or UaF.
      
      This change addresses the issue by moving the token removal
      into the msk destructor.
      
      Fixes: 79c0949e ("mptcp: Add key generation and token tree")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5c79763
    • P
      mptcp: fix race between MP_JOIN and close · 10f6d46c
      Paolo Abeni 提交于
      If a MP_JOIN subflow completes the 3whs while another
      CPU is closing the master msk, we can hit the
      following race:
      
      CPU1                                    CPU2
      
      close()
       mptcp_close
                                              subflow_syn_recv_sock
                                               mptcp_token_get_sock
                                               mptcp_finish_join
                                                inet_sk_state_load
        mptcp_token_destroy
        inet_sk_state_store(TCP_CLOSE)
        __mptcp_flush_join_list()
                                                mptcp_sock_graft
                                                list_add_tail
        sk_common_release
         sock_orphan()
       <socket free>
      
      The MP_JOIN socket will be leaked. Additionally we can hit
      UaF for the msk 'struct socket' referenced via the 'conn'
      field.
      
      This change try to address the issue introducing some
      synchronization between the MP_JOIN 3whs and mptcp_close
      via the join_list spinlock. If we detect the msk is closing
      the MP_JOIN socket is closed, too.
      
      Fixes: f296234c ("mptcp: Add handling of incoming MP_JOIN requests")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      10f6d46c
    • P
      mptcp: fix unblocking connect() · 41be81a8
      Paolo Abeni 提交于
      Currently unblocking connect() on MPTCP sockets fails frequently.
      If mptcp_stream_connect() is invoked to complete a previously
      attempted unblocking connection, it will still try to create
      the first subflow via __mptcp_socket_create(). If the 3whs is
      completed and the 'can_ack' flag is already set, the latter
      will fail with -EINVAL.
      
      This change addresses the issue checking for pending connect and
      delegating the completion to the first subflow. Additionally
      do msk addresses and sk_state changes only when needed.
      
      Fixes: 2303f994 ("mptcp: Associate MPTCP context with TCP socket")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41be81a8
    • W
      net/sched: act_ct: add nat mangle action only for NAT-conntrack · 05aa69e5
      wenxu 提交于
      Currently add nat mangle action with comparing invert and orig tuple.
      It is better to check IPS_NAT_MASK flags first to avoid non necessary
      memcmp for non-NAT conntrack.
      Signed-off-by: Nwenxu <wenxu@ucloud.cn>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05aa69e5
    • Y
      devinet: fix memleak in inetdev_init() · 1b49cd71
      Yang Yingliang 提交于
      When devinet_sysctl_register() failed, the memory allocated
      in neigh_parms_alloc() should be freed.
      
      Fixes: 20e61da7 ("ipv4: fail early when creating netdev named all or default")
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b49cd71
    • J
      virtio_vsock: Fix race condition in virtio_transport_recv_pkt · 8692cefc
      Jia He 提交于
      When client on the host tries to connect(SOCK_STREAM, O_NONBLOCK) to the
      server on the guest, there will be a panic on a ThunderX2 (armv8a server):
      
      [  463.718844] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
      [  463.718848] Mem abort info:
      [  463.718849]   ESR = 0x96000044
      [  463.718852]   EC = 0x25: DABT (current EL), IL = 32 bits
      [  463.718853]   SET = 0, FnV = 0
      [  463.718854]   EA = 0, S1PTW = 0
      [  463.718855] Data abort info:
      [  463.718856]   ISV = 0, ISS = 0x00000044
      [  463.718857]   CM = 0, WnR = 1
      [  463.718859] user pgtable: 4k pages, 48-bit VAs, pgdp=0000008f6f6e9000
      [  463.718861] [0000000000000000] pgd=0000000000000000
      [  463.718866] Internal error: Oops: 96000044 [#1] SMP
      [...]
      [  463.718977] CPU: 213 PID: 5040 Comm: vhost-5032 Tainted: G           O      5.7.0-rc7+ #139
      [  463.718980] Hardware name: GIGABYTE R281-T91-00/MT91-FS1-00, BIOS F06 09/25/2018
      [  463.718982] pstate: 60400009 (nZCv daif +PAN -UAO)
      [  463.718995] pc : virtio_transport_recv_pkt+0x4c8/0xd40 [vmw_vsock_virtio_transport_common]
      [  463.718999] lr : virtio_transport_recv_pkt+0x1fc/0xd40 [vmw_vsock_virtio_transport_common]
      [  463.719000] sp : ffff80002dbe3c40
      [...]
      [  463.719025] Call trace:
      [  463.719030]  virtio_transport_recv_pkt+0x4c8/0xd40 [vmw_vsock_virtio_transport_common]
      [  463.719034]  vhost_vsock_handle_tx_kick+0x360/0x408 [vhost_vsock]
      [  463.719041]  vhost_worker+0x100/0x1a0 [vhost]
      [  463.719048]  kthread+0x128/0x130
      [  463.719052]  ret_from_fork+0x10/0x18
      
      The race condition is as follows:
      Task1                                Task2
      =====                                =====
      __sock_release                       virtio_transport_recv_pkt
        __vsock_release                      vsock_find_bound_socket (found sk)
          lock_sock_nested
          vsock_remove_sock
          sock_orphan
            sk_set_socket(sk, NULL)
          sk->sk_shutdown = SHUTDOWN_MASK
          ...
          release_sock
                                          lock_sock
                                             virtio_transport_recv_connecting
                                               sk->sk_socket->state (panic!)
      
      The root cause is that vsock_find_bound_socket can't hold the lock_sock,
      so there is a small race window between vsock_find_bound_socket() and
      lock_sock(). If __vsock_release() is running in another task,
      sk->sk_socket will be set to NULL inadvertently.
      
      This fixes it by checking sk->sk_shutdown(suggested by Stefano) after
      lock_sock since sk->sk_shutdown is set to SHUTDOWN_MASK under the
      protection of lock_sock_nested.
      Signed-off-by: NJia He <justin.he@arm.com>
      Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8692cefc
  2. 30 5月, 2020 17 次提交
    • T
      drivers/net/ibmvnic: Update VNIC protocol version reporting · 78468899
      Thomas Falcon 提交于
      VNIC protocol version is reported in big-endian format, but it
      is not byteswapped before logging. Fix that, and remove version
      comparison as only one protocol version exists at this time.
      Signed-off-by: NThomas Falcon <tlfalcon@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      78468899
    • C
      NFC: st21nfca: add missed kfree_skb() in an error path · 3decabdc
      Chuhong Yuan 提交于
      st21nfca_tm_send_atr_res() misses to call kfree_skb() in an error path.
      Add the missed function call to fix it.
      
      Fixes: 1892bf84 ("NFC: st21nfca: Adding P2P support to st21nfca in Initiator & Target mode")
      Signed-off-by: NChuhong Yuan <hslester96@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3decabdc
    • H
      neigh: fix ARP retransmit timer guard · 96d10d5b
      Hangbin Liu 提交于
      In commit 19e16d22 ("neigh: support smaller retrans_time settting")
      we add more accurate control for ARP and NS. But for ARP I forgot to
      update the latest guard in neigh_timer_handler(), then the next
      retransmit would be reset to jiffies + HZ/2 if we set the retrans_time
      less than 500ms. Fix it by setting the time_before() check to HZ/100.
      
      IPv6 does not have this issue.
      Reported-by: NJianwen Ji <jiji@redhat.com>
      Fixes: 19e16d22 ("neigh: support smaller retrans_time settting")
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96d10d5b
    • D
      Merge tag 'mlx5-fixes-2020-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · f2b122d3
      David S. Miller 提交于
      Saeed Mahameed says:
      
      ====================
      mlx5 fixes 2020-05-28
      
      This series introduces some fixes to mlx5 driver.
      
      v1->v2:
       - Fix bad sha1, Jakub.
       - Added one more patch by Pablo.
         net/mlx5e: replace EINVAL in mlx5e_flower_parse_meta()
      
      Nothing major, the only patch worth mentioning is the suspend/resume crash
      fix by adding the missing pci device handlers, the fix is very straight
      forward and as Dexuan already expressed, the patch is important for Azure
      users to avoid crash on VM hibernation, patch is marked for -stable v4.6
      below.
      
      Conflict note:
      ('net/mlx5e: Fix MLX5_TC_CT dependencies') has a trivial one line conflict
      with current net-next, which can be resolved by simply using the line from
      net-next.
      
      Please pull and let me know if there is any problem.
      
      For -stable v4.6
       ('net/mlx5: Fix crash upon suspend/resume')
      
      For -stable v5.6
       ('net/mlx5e: replace EINVAL in mlx5e_flower_parse_meta()')
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f2b122d3
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · f9e0ce3d
      David S. Miller 提交于
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf 2020-05-29
      
      The following pull-request contains BPF updates for your *net* tree.
      
      We've added 6 non-merge commits during the last 7 day(s) which contain
      a total of 4 files changed, 55 insertions(+), 34 deletions(-).
      
      The main changes are:
      
      1) minor verifier fix for fmod_ret progs, from Alexei.
      
      2) af_xdp overflow check, from Bjorn.
      
      3) minor verifier fix for 32bit assignment, from John.
      
      4) powerpc has non-overlapping addr space, from Petr.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9e0ce3d
    • J
      bpf, selftests: Add a verifier test for assigning 32bit reg states to 64bit ones · cf66c29b
      John Fastabend 提交于
      Added a verifier test for assigning 32bit reg states to
      64bit where 32bit reg holds a constant value of 0.
      
      Without previous kernel verifier.c fix, the test in
      this patch will fail.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/159077335867.6014.2075350327073125374.stgit@john-Precision-5820-Tower
      cf66c29b
    • J
      bpf, selftests: Verifier bounds tests need to be updated · e3effcdf
      John Fastabend 提交于
      After previous fix for zero extension test_verifier tests #65 and #66 now
      fail. Before the fix we can see the alu32 mov op at insn 10
      
      10: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0)
          R1_w=invP(id=0,
                    smin_value=4294967168,smax_value=4294967423,
                    umin_value=4294967168,umax_value=4294967423,
                    var_off=(0x0; 0x1ffffffff),
                    s32_min_value=-2147483648,s32_max_value=2147483647,
                    u32_min_value=0,u32_max_value=-1)
          R10=fp0 fp-8_w=mmmmmmmm
      10: (bc) w1 = w1
      11: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0)
          R1_w=invP(id=0,
                    smin_value=0,smax_value=2147483647,
                    umin_value=0,umax_value=4294967295,
                    var_off=(0x0; 0xffffffff),
                    s32_min_value=-2147483648,s32_max_value=2147483647,
                    u32_min_value=0,u32_max_value=-1)
          R10=fp0 fp-8_w=mmmmmmmm
      
      After the fix at insn 10 because we have 's32_min_value < 0' the following
      step 11 now has 'smax_value=U32_MAX' where before we pulled the s32_max_value
      bound into the smax_value as seen above in 11 with smax_value=2147483647.
      
      10: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0)
          R1_w=inv(id=0,
                   smin_value=4294967168,smax_value=4294967423,
                   umin_value=4294967168,umax_value=4294967423,
                   var_off=(0x0; 0x1ffffffff),
                   s32_min_value=-2147483648, s32_max_value=2147483647,
                   u32_min_value=0,u32_max_value=-1)
          R10=fp0 fp-8_w=mmmmmmmm
      10: (bc) w1 = w1
      11: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0)
          R1_w=inv(id=0,
                   smin_value=0,smax_value=4294967295,
                   umin_value=0,umax_value=4294967295,
                   var_off=(0x0; 0xffffffff),
                   s32_min_value=-2147483648, s32_max_value=2147483647,
                   u32_min_value=0, u32_max_value=-1)
          R10=fp0 fp-8_w=mmmmmmmm
      
      The fall out of this is by the time we get to the failing instruction at
      step 14 where previously we had the following:
      
      14: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0)
          R1_w=inv(id=0,
                   smin_value=72057594021150720,smax_value=72057594029539328,
                   umin_value=72057594021150720,umax_value=72057594029539328,
                   var_off=(0xffffffff000000; 0xffffff),
                   s32_min_value=-16777216,s32_max_value=-1,
                   u32_min_value=-16777216,u32_max_value=-1)
          R10=fp0 fp-8_w=mmmmmmmm
      14: (0f) r0 += r1
      
      We now have,
      
      14: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0)
          R1_w=inv(id=0,
                   smin_value=0,smax_value=72057594037927935,
                   umin_value=0,umax_value=72057594037927935,
                   var_off=(0x0; 0xffffffffffffff),
                   s32_min_value=-2147483648,s32_max_value=2147483647,
                   u32_min_value=0,u32_max_value=-1)
          R10=fp0 fp-8_w=mmmmmmmm
      14: (0f) r0 += r1
      
      In the original step 14 'smin_value=72057594021150720' this trips the logic
      in the verifier function check_reg_sane_offset(),
      
       if (smin >= BPF_MAX_VAR_OFF || smin <= -BPF_MAX_VAR_OFF) {
      	verbose(env, "value %lld makes %s pointer be out of bounds\n",
      		smin, reg_type_str[type]);
      	return false;
       }
      
      Specifically, the 'smin <= -BPF_MAX_VAR_OFF' check. But with the fix
      at step 14 we have bounds 'smin_value=0' so the above check is not tripped
      because BPF_MAX_VAR_OFF=1<<29.
      
      We have a smin_value=0 here because at step 10 the smaller smin_value=0 means
      the subtractions at steps 11 and 12 bring the smin_value negative.
      
      11: (17) r1 -= 2147483584
      12: (17) r1 -= 2147483584
      13: (77) r1 >>= 8
      
      Then the shift clears the top bit and smin_value is set to 0. Note we still
      have the smax_value in the fixed code so any reads will fail. An alternative
      would be to have reg_sane_check() do both smin and smax value tests.
      
      To fix the test we can omit the 'r1 >>=8' at line 13. This will change the
      err string, but keeps the intention of the test as suggseted by the title,
      "check after truncation of boundary-crossing range". If the verifier logic
      changes a different value is likely to be thrown in the error or the error
      will no longer be thrown forcing this test to be examined. With this change
      we see the new state at step 13.
      
      13: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0)
          R1_w=invP(id=0,
                    smin_value=-4294967168,smax_value=127,
                    umin_value=0,umax_value=18446744073709551615,
                    s32_min_value=-2147483648,s32_max_value=2147483647,
                    u32_min_value=0,u32_max_value=-1)
          R10=fp0 fp-8_w=mmmmmmmm
      
      Giving the expected out of bounds error, "value -4294967168 makes map_value
      pointer be out of bounds" However, for unpriv case we see a different error
      now because of the mixed signed bounds pointer arithmatic. This seems OK so
      I've only added the unpriv_errstr for this. Another optino may have been to
      do addition on r1 instead of subtraction but I favor the approach above
      slightly.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/159077333942.6014.14004320043595756079.stgit@john-Precision-5820-Tower
      e3effcdf
    • J
      bpf: Fix a verifier issue when assigning 32bit reg states to 64bit ones · 3a71dc36
      John Fastabend 提交于
      With the latest trunk llvm (llvm 11), I hit a verifier issue for
      test_prog subtest test_verif_scale1.
      
      The following simplified example illustrate the issue:
          w9 = 0  /* R9_w=inv0 */
          r8 = *(u32 *)(r1 + 80)  /* __sk_buff->data_end */
          r7 = *(u32 *)(r1 + 76)  /* __sk_buff->data */
          ......
          w2 = w9 /* R2_w=inv0 */
          r6 = r7 /* R6_w=pkt(id=0,off=0,r=0,imm=0) */
          r6 += r2 /* R6_w=inv(id=0) */
          r3 = r6 /* R3_w=inv(id=0) */
          r3 += 14 /* R3_w=inv(id=0) */
          if r3 > r8 goto end
          r5 = *(u32 *)(r6 + 0) /* R6_w=inv(id=0) */
             <== error here: R6 invalid mem access 'inv'
          ...
        end:
      
      In real test_verif_scale1 code, "w9 = 0" and "w2 = w9" are in
      different basic blocks.
      
      In the above, after "r6 += r2", r6 becomes a scalar, which eventually
      caused the memory access error. The correct register state should be
      a pkt pointer.
      
      The inprecise register state starts at "w2 = w9".
      The 32bit register w9 is 0, in __reg_assign_32_into_64(),
      the 64bit reg->smax_value is assigned to be U32_MAX.
      The 64bit reg->smin_value is 0 and the 64bit register
      itself remains constant based on reg->var_off.
      
      In adjust_ptr_min_max_vals(), the verifier checks for a known constant,
      smin_val must be equal to smax_val. Since they are not equal,
      the verifier decides r6 is a unknown scalar, which caused later failure.
      
      The llvm10 does not have this issue as it generates different code:
          w9 = 0  /* R9_w=inv0 */
          r8 = *(u32 *)(r1 + 80)  /* __sk_buff->data_end */
          r7 = *(u32 *)(r1 + 76)  /* __sk_buff->data */
          ......
          r6 = r7 /* R6_w=pkt(id=0,off=0,r=0,imm=0) */
          r6 += r9 /* R6_w=pkt(id=0,off=0,r=0,imm=0) */
          r3 = r6 /* R3_w=pkt(id=0,off=0,r=0,imm=0) */
          r3 += 14 /* R3_w=pkt(id=0,off=14,r=0,imm=0) */
          if r3 > r8 goto end
          ...
      
      To fix the above issue, we can include zero in the test condition for
      assigning the s32_max_value and s32_min_value to their 64-bit equivalents
      smax_value and smin_value.
      
      Further, fix the condition to avoid doing zero extension bounds checks
      when s32_min_value <= 0. This could allow for the case where bounds
      32-bit bounds (-1,1) get incorrectly translated to (0,1) 64-bit bounds.
      When in-fact the -1 min value needs to force U32_MAX bound.
      
      Fixes: 3f50f132 ("bpf: Verifier, do explicit ALU32 bounds tracking")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/159077331983.6014.5758956193749002737.stgit@john-Precision-5820-Tower
      3a71dc36
    • A
      bpf: Fix use-after-free in fmod_ret check · 18644cec
      Alexei Starovoitov 提交于
      Fix the following issue:
      [  436.749342] BUG: KASAN: use-after-free in bpf_trampoline_put+0x39/0x2a0
      [  436.749995] Write of size 4 at addr ffff8881ef38b8a0 by task kworker/3:5/2243
      [  436.750712]
      [  436.752677] Workqueue: events bpf_prog_free_deferred
      [  436.753183] Call Trace:
      [  436.756483]  bpf_trampoline_put+0x39/0x2a0
      [  436.756904]  bpf_prog_free_deferred+0x16d/0x3d0
      [  436.757377]  process_one_work+0x94a/0x15b0
      [  436.761969]
      [  436.762130] Allocated by task 2529:
      [  436.763323]  bpf_trampoline_lookup+0x136/0x540
      [  436.763776]  bpf_check+0x2872/0xa0a8
      [  436.764144]  bpf_prog_load+0xb6f/0x1350
      [  436.764539]  __do_sys_bpf+0x16d7/0x3720
      [  436.765825]
      [  436.765988] Freed by task 2529:
      [  436.767084]  kfree+0xc6/0x280
      [  436.767397]  bpf_trampoline_put+0x1fd/0x2a0
      [  436.767826]  bpf_check+0x6832/0xa0a8
      [  436.768197]  bpf_prog_load+0xb6f/0x1350
      [  436.768594]  __do_sys_bpf+0x16d7/0x3720
      
      prog->aux->trampoline = tr should be set only when prog is valid.
      Otherwise prog freeing will try to put trampoline via prog->aux->trampoline,
      but it may not point to a valid trampoline.
      
      Fixes: 6ba43b76 ("bpf: Attachment verification for BPF_MODIFY_RETURN")
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NKP Singh <kpsingh@google.com>
      Link: https://lore.kernel.org/bpf/20200529043839.15824-2-alexei.starovoitov@gmail.com
      18644cec
    • P
      net/mlx5e: replace EINVAL in mlx5e_flower_parse_meta() · a683012a
      Pablo Neira Ayuso 提交于
      The drivers reports EINVAL to userspace through netlink on invalid meta
      match. This is confusing since EINVAL is usually reserved for malformed
      netlink messages. Replace it by more meaningful codes.
      
      Fixes: 6d65bc64 ("net/mlx5e: Add mlx5e_flower_parse_meta support")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      a683012a
    • V
      net/mlx5e: Fix MLX5_TC_CT dependencies · cb9a0641
      Vlad Buslov 提交于
      Change MLX5_TC_CT config dependencies to include MLX5_ESWITCH instead of
      MLX5_CORE_EN && NET_SWITCHDEV, which are already required by MLX5_ESWITCH.
      Without this change mlx5 fails to compile if user disables MLX5_ESWITCH
      without also manually disabling MLX5_TC_CT.
      
      Fixes: 4c3844d9 ("net/mlx5e: CT: Introduce connection tracking")
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: NRoi Dayan <roid@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      cb9a0641
    • T
      net/mlx5e: Properly set default values when disabling adaptive moderation · ebeaf084
      Tal Gilboa 提交于
      Add a call to mlx5e_reset_rx/tx_moderation() when enabling/disabling
      adaptive moderation, in order to select the proper default values.
      
      In order to do so, we separate the logic of selecting the moderation values
      and setting moderion mode (CQE/EQE based).
      
      Fixes: 0088cbbc ("net/mlx5e: Enable CQE based moderation on TX CQ")
      Fixes: 9908aa29 ("net/mlx5e: CQE based moderation")
      Signed-off-by: NTal Gilboa <talgi@mellanox.com>
      Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      ebeaf084
    • A
      net/mlx5e: Fix arch depending casting issue in FEC · b623603b
      Aya Levin 提交于
      Change type of active_fec to u32 to match the type expected by
      mlx5e_get_fec_mode. Copy active_fec and configured_fec values to
      unsigned long before preforming bitwise manipulations.
      Take the same approach when configuring FEC over 50G link modes: copy
      the policy into an unsigned long and only than preform bitwise
      operations.
      
      Fixes: 2132b71f ("net/mlx5e: Advertise globaly supported FEC modes")
      Signed-off-by: NAya Levin <ayal@mellanox.com>
      Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      b623603b
    • M
      net/mlx5e: Remove warning "devices are not on same switch HW" · 20300aaf
      Maor Dickman 提交于
      On tunnel decap rule insertion, the indirect mechanism will attempt to
      offload the rule on all uplink representors which will trigger the
      "devices are not on same switch HW, can't offload forwarding" message
      for the uplink which isn't on the same switch HW as the VF representor.
      
      The above flow is valid and shouldn't cause warning message,
      fix by removing the warning and only report this flow using extack.
      
      Fixes: 32134847 ("net/mlx5e: Fix allowed tc redirect merged eswitch offload cases")
      Signed-off-by: NMaor Dickman <maord@mellanox.com>
      Reviewed-by: NRoi Dayan <roid@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      20300aaf
    • R
      net/mlx5e: Fix stats update for matchall classifier · 0a2a6f49
      Roi Dayan 提交于
      It's bytes, packets, lastused.
      
      Fixes: fcb64c0f ("net/mlx5: E-Switch, add ingress rate support")
      Signed-off-by: NRoi Dayan <roid@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      0a2a6f49
    • M
      net/mlx5: Fix crash upon suspend/resume · 8fc3e29b
      Mark Bloch 提交于
      Currently a Linux system with the mlx5 NIC always crashes upon
      hibernation - suspend/resume.
      
      Add basic callbacks so the NIC could be suspended and resumed.
      
      Fixes: 9603b61d ("mlx5: Move pci device handling from mlx5_ib to mlx5_core")
      Tested-by: NDexuan Cui <decui@microsoft.com>
      Signed-off-by: NMark Bloch <markb@mellanox.com>
      Reviewed-by: NMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      8fc3e29b
    • D
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · 942110fd
      David S. Miller 提交于
      Steffen Klassert says:
      
      ====================
      pull request (net): ipsec 2020-05-29
      
      1) Several fixes for ESP gro/gso in transport and beet mode when
         IPv6 extension headers are present. From Xin Long.
      
      2) Fix a wrong comment on XFRMA_OFFLOAD_DEV.
         From Antony Antony.
      
      3) Fix sk_destruct callback handling on ESP in TCP encapsulation.
         From Sabrina Dubroca.
      
      4) Fix a use after free in xfrm_output_gso when used with vxlan.
         From Xin Long.
      
      5) Fix secpath handling of VTI when used wiuth IPCOMP.
         From Xin Long.
      
      6) Fix an oops when deleting a x-netns xfrm interface.
         From Nicolas Dichtel.
      
      7) Fix a possible warning on policy updates. We had a case where it was
         possible to add two policies with the same lookup keys.
         From Xin Long.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      942110fd
  3. 29 5月, 2020 5 次提交
    • X
      xfrm: fix a NULL-ptr deref in xfrm_local_error · f6a23d85
      Xin Long 提交于
      This patch is to fix a crash:
      
        [ ] kasan: GPF could be caused by NULL-ptr deref or user memory access
        [ ] general protection fault: 0000 [#1] SMP KASAN PTI
        [ ] RIP: 0010:ipv6_local_error+0xac/0x7a0
        [ ] Call Trace:
        [ ]  xfrm6_local_error+0x1eb/0x300
        [ ]  xfrm_local_error+0x95/0x130
        [ ]  __xfrm6_output+0x65f/0xb50
        [ ]  xfrm6_output+0x106/0x46f
        [ ]  udp_tunnel6_xmit_skb+0x618/0xbf0 [ip6_udp_tunnel]
        [ ]  vxlan_xmit_one+0xbc6/0x2c60 [vxlan]
        [ ]  vxlan_xmit+0x6a0/0x4276 [vxlan]
        [ ]  dev_hard_start_xmit+0x165/0x820
        [ ]  __dev_queue_xmit+0x1ff0/0x2b90
        [ ]  ip_finish_output2+0xd3e/0x1480
        [ ]  ip_do_fragment+0x182d/0x2210
        [ ]  ip_output+0x1d0/0x510
        [ ]  ip_send_skb+0x37/0xa0
        [ ]  raw_sendmsg+0x1b4c/0x2b80
        [ ]  sock_sendmsg+0xc0/0x110
      
      This occurred when sending a v4 skb over vxlan6 over ipsec, in which case
      skb->protocol == htons(ETH_P_IPV6) while skb->sk->sk_family == AF_INET in
      xfrm_local_error(). Then it will go to xfrm6_local_error() where it tries
      to get ipv6 info from a ipv4 sk.
      
      This issue was actually fixed by Commit 628e341f ("xfrm: make local
      error reporting more robust"), but brought back by Commit 844d4874
      ("xfrm: choose protocol family by skb protocol").
      
      So to fix it, we should call xfrm6_local_error() only when skb->protocol
      is htons(ETH_P_IPV6) and skb->sk->sk_family is AF_INET6.
      
      Fixes: 844d4874 ("xfrm: choose protocol family by skb protocol")
      Reported-by: NXiumei Mu <xmu@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      f6a23d85
    • E
      net: be more gentle about silly gso requests coming from user · 7c6d2ecb
      Eric Dumazet 提交于
      Recent change in virtio_net_hdr_to_skb() broke some packetdrill tests.
      
      When --mss=XXX option is set, packetdrill always provide gso_type & gso_size
      for its inbound packets, regardless of packet size.
      
      	if (packet->tcp && packet->mss) {
      		if (packet->ipv4)
      			gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
      		else
      			gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
      		gso.gso_size = packet->mss;
      	}
      
      Since many other programs could do the same, relax virtio_net_hdr_to_skb()
      to no longer return an error, but instead ignore gso settings.
      
      This keeps Willem intent to make sure no malicious packet could
      reach gso stack.
      
      Note that TCP stack has a special logic in tcp_set_skb_tso_segs()
      to clear gso_size for small packets.
      
      Fixes: 6dd912f8 ("net: check untrusted gso_size at kernel entry")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c6d2ecb
    • J
      sctp: check assoc before SCTP_ADDR_{MADE_PRIM, ADDED} event · 45ebf73e
      Jonas Falkevik 提交于
      Make sure SCTP_ADDR_{MADE_PRIM,ADDED} are sent only for associations
      that have been established.
      
      These events are described in rfc6458#section-6.1
      SCTP_PEER_ADDR_CHANGE:
      This tag indicates that an address that is
      part of an existing association has experienced a change of
      state (e.g., a failure or return to service of the reachability
      of an endpoint via a specific transport address).
      Signed-off-by: NJonas Falkevik <jonas.falkevik@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Reviewed-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45ebf73e
    • Q
      bonding: Fix reference count leak in bond_sysfs_slave_add. · a068aab4
      Qiushi Wu 提交于
      kobject_init_and_add() takes reference even when it fails.
      If this function returns an error, kobject_put() must be called to
      properly clean up the memory associated with the object. Previous
      commit "b8eb7183" fixed a similar problem.
      
      Fixes: 07699f9a ("bonding: add sysfs /slave dir for bond slave devices.")
      Signed-off-by: NQiushi Wu <wu000273@umn.edu>
      Acked-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a068aab4
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 2200313a
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Uninitialized when used in __nf_conntrack_update(), from
         Nathan Chancellor.
      
      2) Comparison of unsigned expression in nf_confirm_cthelper().
      
      3) Remove 'const' type qualifier with no effect.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2200313a
  4. 28 5月, 2020 7 次提交
    • P
      powerpc/bpf: Enable bpf_probe_read{, str}() on powerpc again · d195b1d1
      Petr Mladek 提交于
      The commit 0ebeea8c ("bpf: Restrict bpf_probe_read{, str}() only
      to archs where they work") caused that bpf_probe_read{, str}() functions
      were not longer available on architectures where the same logical address
      might have different content in kernel and user memory mapping. These
      architectures should use probe_read_{user,kernel}_str helpers.
      
      For backward compatibility, the problematic functions are still available
      on architectures where the user and kernel address spaces are not
      overlapping. This is defined CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE.
      
      At the moment, these backward compatible functions are enabled only on x86_64,
      arm, and arm64. Let's do it also on powerpc that has the non overlapping
      address space as well.
      
      Fixes: 0ebeea8c ("bpf: Restrict bpf_probe_read{, str}() only to archs where they work")
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/lkml/20200527122844.19524-1-pmladek@suse.com
      d195b1d1
    • V
      net: dsa: declare lockless TX feature for slave ports · 2b86cb82
      Vladimir Oltean 提交于
      Be there a platform with the following layout:
      
            Regular NIC
             |
             +----> DSA master for switch port
                     |
                     +----> DSA master for another switch port
      
      After changing DSA back to static lockdep class keys in commit
      1a33e10e ("net: partially revert dynamic lockdep key changes"), this
      kernel splat can be seen:
      
      [   13.361198] ============================================
      [   13.366524] WARNING: possible recursive locking detected
      [   13.371851] 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988 Not tainted
      [   13.377874] --------------------------------------------
      [   13.383201] swapper/0/0 is trying to acquire lock:
      [   13.388004] ffff0000668ff298 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
      [   13.397879]
      [   13.397879] but task is already holding lock:
      [   13.403727] ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
      [   13.413593]
      [   13.413593] other info that might help us debug this:
      [   13.420140]  Possible unsafe locking scenario:
      [   13.420140]
      [   13.426075]        CPU0
      [   13.428523]        ----
      [   13.430969]   lock(&dsa_slave_netdev_xmit_lock_key);
      [   13.435946]   lock(&dsa_slave_netdev_xmit_lock_key);
      [   13.440924]
      [   13.440924]  *** DEADLOCK ***
      [   13.440924]
      [   13.446860]  May be due to missing lock nesting notation
      [   13.446860]
      [   13.453668] 6 locks held by swapper/0/0:
      [   13.457598]  #0: ffff800010003de0 ((&idev->mc_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x0/0x400
      [   13.466593]  #1: ffffd4d3fb478700 (rcu_read_lock){....}-{1:2}, at: mld_sendpack+0x0/0x560
      [   13.474803]  #2: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: ip6_finish_output2+0x64/0xb10
      [   13.483886]  #3: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0
      [   13.492793]  #4: ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
      [   13.503094]  #5: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0
      [   13.512000]
      [   13.512000] stack backtrace:
      [   13.516369] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988
      [   13.530421] Call trace:
      [   13.532871]  dump_backtrace+0x0/0x1d8
      [   13.536539]  show_stack+0x24/0x30
      [   13.539862]  dump_stack+0xe8/0x150
      [   13.543271]  __lock_acquire+0x1030/0x1678
      [   13.547290]  lock_acquire+0xf8/0x458
      [   13.550873]  _raw_spin_lock+0x44/0x58
      [   13.554543]  __dev_queue_xmit+0x84c/0xbe0
      [   13.558562]  dev_queue_xmit+0x24/0x30
      [   13.562232]  dsa_slave_xmit+0xe0/0x128
      [   13.565988]  dev_hard_start_xmit+0xf4/0x448
      [   13.570182]  __dev_queue_xmit+0x808/0xbe0
      [   13.574200]  dev_queue_xmit+0x24/0x30
      [   13.577869]  neigh_resolve_output+0x15c/0x220
      [   13.582237]  ip6_finish_output2+0x244/0xb10
      [   13.586430]  __ip6_finish_output+0x1dc/0x298
      [   13.590709]  ip6_output+0x84/0x358
      [   13.594116]  mld_sendpack+0x2bc/0x560
      [   13.597786]  mld_ifc_timer_expire+0x210/0x390
      [   13.602153]  call_timer_fn+0xcc/0x400
      [   13.605822]  run_timer_softirq+0x588/0x6e0
      [   13.609927]  __do_softirq+0x118/0x590
      [   13.613597]  irq_exit+0x13c/0x148
      [   13.616918]  __handle_domain_irq+0x6c/0xc0
      [   13.621023]  gic_handle_irq+0x6c/0x160
      [   13.624779]  el1_irq+0xbc/0x180
      [   13.627927]  cpuidle_enter_state+0xb4/0x4d0
      [   13.632120]  cpuidle_enter+0x3c/0x50
      [   13.635703]  call_cpuidle+0x44/0x78
      [   13.639199]  do_idle+0x228/0x2c8
      [   13.642433]  cpu_startup_entry+0x2c/0x48
      [   13.646363]  rest_init+0x1ac/0x280
      [   13.649773]  arch_call_rest_init+0x14/0x1c
      [   13.653878]  start_kernel+0x490/0x4bc
      
      Lockdep keys themselves were added in commit ab92d68f ("net: core:
      add generic lockdep keys"), and it's very likely that this splat existed
      since then, but I have no real way to check, since this stacked platform
      wasn't supported by mainline back then.
      
      >From Taehee's own words:
      
        This patch was considered that all stackable devices have LLTX flag.
        But the dsa doesn't have LLTX, so this splat happened.
        After this patch, dsa shares the same lockdep class key.
        On the nested dsa interface architecture, which you illustrated,
        the same lockdep class key will be used in __dev_queue_xmit() because
        dsa doesn't have LLTX.
        So that lockdep detects deadlock because the same lockdep class key is
        used recursively although actually the different locks are used.
        There are some ways to fix this problem.
      
        1. using NETIF_F_LLTX flag.
        If possible, using the LLTX flag is a very clear way for it.
        But I'm so sorry I don't know whether the dsa could have LLTX or not.
      
        2. using dynamic lockdep again.
        It means that each interface uses a separate lockdep class key.
        So, lockdep will not detect recursive locking.
        But this way has a problem that it could consume lockdep class key
        too many.
        Currently, lockdep can have 8192 lockdep class keys.
         - you can see this number with the following command.
           cat /proc/lockdep_stats
           lock-classes:                         1251 [max: 8192]
           ...
           The [max: 8192] means that the maximum number of lockdep class keys.
        If too many lockdep class keys are registered, lockdep stops to work.
        So, using a dynamic(separated) lockdep class key should be considered
        carefully.
        In addition, updating lockdep class key routine might have to be existing.
        (lockdep_register_key(), lockdep_set_class(), lockdep_unregister_key())
      
        3. Using lockdep subclass.
        A lockdep class key could have 8 subclasses.
        The different subclass is considered different locks by lockdep
        infrastructure.
        But "lock-classes" is not counted by subclasses.
        So, it could avoid stopping lockdep infrastructure by an overflow of
        lockdep class keys.
        This approach should also have an updating lockdep class key routine.
        (lockdep_set_subclass())
      
        4. Using nonvalidate lockdep class key.
        The lockdep infrastructure supports nonvalidate lockdep class key type.
        It means this lockdep is not validated by lockdep infrastructure.
        So, the splat will not happen but lockdep couldn't detect real deadlock
        case because lockdep really doesn't validate it.
        I think this should be used for really special cases.
        (lockdep_set_novalidate_class())
      
      Further discussion here:
      https://patchwork.ozlabs.org/project/netdev/patch/20200503052220.4536-2-xiyou.wangcong@gmail.com/
      
      There appears to be no negative side-effect to declaring lockless TX for
      the DSA virtual interfaces, which means they handle their own locking.
      So that's what we do to make the splat go away.
      
      Patch tested in a wide variety of cases: unicast, multicast, PTP, etc.
      
      Fixes: ab92d68f ("net: core: add generic lockdep keys")
      Suggested-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b86cb82
    • V
      net: dsa: felix: send VLANs on CPU port as egress-tagged · 183be6f9
      Vladimir Oltean 提交于
      As explained in other commits before (b9cd75e6 and 87b0f983),
      ocelot switches have a single egress-untagged VLAN per port, and the
      driver would deny adding a second one while an egress-untagged VLAN
      already exists.
      
      But on the CPU port (where the VLAN configuration is implicit, because
      there is no net device for the bridge to control), the DSA core attempts
      to add a VLAN using the same flags as were used for the front-panel
      port. This would make adding any untagged VLAN fail due to the CPU port
      rejecting the configuration:
      
      bridge vlan add dev swp0 vid 100 pvid untagged
      [ 1865.854253] mscc_felix 0000:00:00.5: Port already has a native VLAN: 1
      [ 1865.860824] mscc_felix 0000:00:00.5: Failed to add VLAN 100 to port 5: -16
      
      (note that port 5 is the CPU port and not the front-panel swp0).
      
      So this hardware will send all VLANs as tagged towards the CPU.
      
      Fixes: 56051948 ("net: dsa: ocelot: add driver for Felix switch family")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      183be6f9
    • A
      bridge: multicast: work around clang bug · b3b6a84c
      Arnd Bergmann 提交于
      Clang-10 and clang-11 run into a corner case of the register
      allocator on 32-bit ARM, leading to excessive stack usage from
      register spilling:
      
      net/bridge/br_multicast.c:2422:6: error: stack frame size of 1472 bytes in function 'br_multicast_get_stats' [-Werror,-Wframe-larger-than=]
      
      Work around this by marking one of the internal functions as
      noinline_for_stack.
      
      Link: https://bugs.llvm.org/show_bug.cgi?id=45802#c9Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b3b6a84c
    • S
      vsock: fix timeout in vsock_accept() · 7e0afbdf
      Stefano Garzarella 提交于
      The accept(2) is an "input" socket interface, so we should use
      SO_RCVTIMEO instead of SO_SNDTIMEO to set the timeout.
      
      So this patch replace sock_sndtimeo() with sock_rcvtimeo() to
      use the right timeout in the vsock_accept().
      
      Fixes: d021c344 ("VSOCK: Introduce VM Sockets")
      Signed-off-by: NStefano Garzarella <sgarzare@redhat.com>
      Reviewed-by: NJorgen Hansen <jhansen@vmware.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e0afbdf
    • H
      nfp: flower: fix used time of merge flow statistics · 5b186cd6
      Heinrich Kuhn 提交于
      Prior to this change the correct value for the used counter is calculated
      but not stored nor, therefore, propagated to user-space. In use-cases such
      as OVS use-case at least this results in active flows being removed from
      the hardware datapath. Which results in both unnecessary flow tear-down
      and setup, and packet processing on the host.
      
      This patch addresses the problem by saving the calculated used value
      which allows the value to propagate to user-space.
      
      Found by inspection.
      
      Fixes: aa6ce2ea ("nfp: flower: support stats update for merge flows")
      Signed-off-by: NHeinrich Kuhn <heinrich.kuhn@netronome.com>
      Signed-off-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b186cd6
    • D
      net/sched: fix infinite loop in sch_fq_pie · bb2f930d
      Davide Caratti 提交于
      this command hangs forever:
      
       # tc qdisc add dev eth0 root fq_pie flows 65536
      
       watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [tc:1028]
       [...]
       CPU: 1 PID: 1028 Comm: tc Not tainted 5.7.0-rc6+ #167
       RIP: 0010:fq_pie_init+0x60e/0x8b7 [sch_fq_pie]
       Code: 4c 89 65 50 48 89 f8 48 c1 e8 03 42 80 3c 30 00 0f 85 2a 02 00 00 48 8d 7d 10 4c 89 65 58 48 89 f8 48 c1 e8 03 42 80 3c 30 00 <0f> 85 a7 01 00 00 48 8d 7d 18 48 c7 45 10 46 c3 23 00 48 89 f8 48
       RSP: 0018:ffff888138d67468 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
       RAX: 1ffff9200018d2b2 RBX: ffff888139c1c400 RCX: ffffffffffffffff
       RDX: 000000000000c5e8 RSI: ffffc900000e5000 RDI: ffffc90000c69590
       RBP: ffffc90000c69580 R08: fffffbfff79a9699 R09: fffffbfff79a9699
       R10: 0000000000000700 R11: fffffbfff79a9698 R12: ffffc90000c695d0
       R13: 0000000000000000 R14: dffffc0000000000 R15: 000000002347c5e8
       FS:  00007f01e1850e40(0000) GS:ffff88814c880000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 000000000067c340 CR3: 000000013864c000 CR4: 0000000000340ee0
       Call Trace:
        qdisc_create+0x3fd/0xeb0
        tc_modify_qdisc+0x3be/0x14a0
        rtnetlink_rcv_msg+0x5f3/0x920
        netlink_rcv_skb+0x121/0x350
        netlink_unicast+0x439/0x630
        netlink_sendmsg+0x714/0xbf0
        sock_sendmsg+0xe2/0x110
        ____sys_sendmsg+0x5b4/0x890
        ___sys_sendmsg+0xe9/0x160
        __sys_sendmsg+0xd3/0x170
        do_syscall_64+0x9a/0x370
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      we can't accept 65536 as a valid number for 'nflows', because the loop on
      'idx' in fq_pie_init() will never end. The extack message is correct, but
      it doesn't say that 0 is not a valid number for 'flows': while at it, fix
      this also. Add a tdc selftest to check correct validation of 'flows'.
      
      CC: Ivan Vecera <ivecera@redhat.com>
      Fixes: ec97ecf1 ("net: sched: add Flow Queue PIE packet scheduler")
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Reviewed-by: NIvan Vecera <ivecera@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bb2f930d
  5. 27 5月, 2020 3 次提交