1. 24 8月, 2020 1 次提交
  2. 04 8月, 2020 1 次提交
    • W
      net/sched: act_ct: fix miss set mru for ovs after defrag in act_ct · 038ebb1a
      wenxu 提交于
      When openvswitch conntrack offload with act_ct action. Fragment packets
      defrag in the ingress tc act_ct action and miss the next chain. Then the
      packet pass to the openvswitch datapath without the mru. The over
      mtu packet will be dropped in output action in openvswitch for over mtu.
      
      "kernel: net2: dropped over-mtu packet: 1528 > 1500"
      
      This patch add mru in the tc_skb_ext for adefrag and miss next chain
      situation. And also add mru in the qdisc_skb_cb. The act_ct set the mru
      to the qdisc_skb_cb when the packet defrag. And When the chain miss,
      The mru is set to tc_skb_ext which can be got by ovs datapath.
      
      Fixes: b57dc7c1 ("net/sched: Introduce action ct")
      Signed-off-by: Nwenxu <wenxu@ucloud.cn>
      Reviewed-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      038ebb1a
  3. 25 7月, 2020 1 次提交
  4. 16 7月, 2020 1 次提交
  5. 30 6月, 2020 1 次提交
    • H
      iov_iter: Move unnecessary inclusion of crypto/hash.h · 7999096f
      Herbert Xu 提交于
      The header file linux/uio.h includes crypto/hash.h which pulls in
      most of the Crypto API.  Since linux/uio.h is used throughout the
      kernel this means that every tiny bit of change to the Crypto API
      causes the entire kernel to get rebuilt.
      
      This patch fixes this by moving it into lib/iov_iter.c instead
      where it is actually used.
      
      This patch also fixes the ifdef to use CRYPTO_HASH instead of just
      CRYPTO which does not guarantee the existence of ahash.
      
      Unfortunately a number of drivers were relying on linux/uio.h to
      provide access to linux/slab.h.  This patch adds inclusions of
      linux/slab.h as detected by build failures.
      
      Also skbuff.h was relying on this to provide a declaration for
      ahash_request.  This patch adds a forward declaration instead.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7999096f
  6. 03 6月, 2020 1 次提交
    • D
      bpf: Fix up bpf_skb_adjust_room helper's skb csum setting · 836e66c2
      Daniel Borkmann 提交于
      Lorenz recently reported:
      
        In our TC classifier cls_redirect [0], we use the following sequence of
        helper calls to decapsulate a GUE (basically IP + UDP + custom header)
        encapsulated packet:
      
          bpf_skb_adjust_room(skb, -encap_len, BPF_ADJ_ROOM_MAC, BPF_F_ADJ_ROOM_FIXED_GSO)
          bpf_redirect(skb->ifindex, BPF_F_INGRESS)
      
        It seems like some checksums of the inner headers are not validated in
        this case. For example, a TCP SYN packet with invalid TCP checksum is
        still accepted by the network stack and elicits a SYN ACK. [...]
      
        That is, we receive the following packet from the driver:
      
          | ETH | IP | UDP | GUE | IP | TCP |
          skb->ip_summed == CHECKSUM_UNNECESSARY
      
        ip_summed is CHECKSUM_UNNECESSARY because our NICs do rx checksum offloading.
        On this packet we run skb_adjust_room_mac(-encap_len), and get the following:
      
          | ETH | IP | TCP |
          skb->ip_summed == CHECKSUM_UNNECESSARY
      
        Note that ip_summed is still CHECKSUM_UNNECESSARY. After bpf_redirect()'ing
        into the ingress, we end up in tcp_v4_rcv(). There, skb_checksum_init() is
        turned into a no-op due to CHECKSUM_UNNECESSARY.
      
      The bpf_skb_adjust_room() helper is not aware of protocol specifics. Internally,
      it handles the CHECKSUM_COMPLETE case via skb_postpull_rcsum(), but that does
      not cover CHECKSUM_UNNECESSARY. In this case skb->csum_level of the original
      skb prior to bpf_skb_adjust_room() call was 0, that is, covering UDP. Right now
      there is no way to adjust the skb->csum_level. NICs that have checksum offload
      disabled (CHECKSUM_NONE) or that support CHECKSUM_COMPLETE are not affected.
      
      Use a safe default for CHECKSUM_UNNECESSARY by resetting to CHECKSUM_NONE and
      add a flag to the helper called BPF_F_ADJ_ROOM_NO_CSUM_RESET that allows users
      from opting out. Opting out is useful for the case where we don't remove/add
      full protocol headers, or for the case where a user wants to adjust the csum
      level manually e.g. through bpf_csum_level() helper that is added in subsequent
      patch.
      
      The bpf_skb_proto_{4_to_6,6_to_4}() for NAT64/46 translation from the BPF
      bpf_skb_change_proto() helper uses bpf_skb_net_hdr_{push,pop}() pair internally
      as well but doesn't change layers, only transitions between v4 to v6 and vice
      versa, therefore no adoption is required there.
      
        [0] https://lore.kernel.org/bpf/20200424185556.7358-1-lmb@cloudflare.com/
      
      Fixes: 2be7e212 ("bpf: add bpf_skb_adjust_room helper")
      Reported-by: NLorenz Bauer <lmb@cloudflare.com>
      Reported-by: NAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NLorenz Bauer <lmb@cloudflare.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: NAlan Maguire <alan.maguire@oracle.com>
      Link: https://lore.kernel.org/bpf/CACAyw9-uU_52esMd1JjuA80fRPHJv5vsSg8GnfW3t_qDU4aVKQ@mail.gmail.com/
      Link: https://lore.kernel.org/bpf/11a90472e7cce83e76ddbfce81fdfce7bfc68808.1591108731.git.daniel@iogearbox.net
      836e66c2
  7. 02 6月, 2020 1 次提交
  8. 18 5月, 2020 1 次提交
  9. 19 4月, 2020 1 次提交
    • G
      skbuff.h: Replace zero-length array with flexible-array member · 5c91aa1d
      Gustavo A. R. Silva 提交于
      The current codebase makes use of the zero-length array language
      extension to the C90 standard, but the preferred mechanism to declare
      variable-length types such as these ones is a flexible array member[1][2],
      introduced in C99:
      
      struct foo {
              int stuff;
              struct boo array[];
      };
      
      By making use of the mechanism above, we will get a compiler warning
      in case the flexible array does not occur last in the structure, which
      will help us prevent some kind of undefined behavior bugs from being
      inadvertently introduced[3] to the codebase from now on.
      
      Also, notice that, dynamic memory allocations won't be affected by
      this change:
      
      "Flexible array members have incomplete type, and so the sizeof operator
      may not be applied. As a quirk of the original implementation of
      zero-length arrays, sizeof evaluates to zero."[1]
      
      This issue was found with the help of Coccinelle.
      
      [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      5c91aa1d
  10. 07 4月, 2020 1 次提交
  11. 30 3月, 2020 1 次提交
  12. 26 3月, 2020 1 次提交
    • P
      net: Fix CONFIG_NET_CLS_ACT=n and CONFIG_NFT_FWD_NETDEV={y, m} build · 2c64605b
      Pablo Neira Ayuso 提交于
      net/netfilter/nft_fwd_netdev.c: In function ‘nft_fwd_netdev_eval’:
          net/netfilter/nft_fwd_netdev.c:32:10: error: ‘struct sk_buff’ has no member named ‘tc_redirected’
            pkt->skb->tc_redirected = 1;
                    ^~
          net/netfilter/nft_fwd_netdev.c:33:10: error: ‘struct sk_buff’ has no member named ‘tc_from_ingress’
            pkt->skb->tc_from_ingress = 1;
                    ^~
      
      To avoid a direct dependency with tc actions from netfilter, wrap the
      redirect bits around CONFIG_NET_REDIRECT and move helpers to
      include/linux/skbuff.h. Turn on this toggle from the ifb driver, the
      only existing client of these bits in the tree.
      
      This patch adds skb_set_redirected() that sets on the redirected bit
      on the skbuff, it specifies if the packet was redirect from ingress
      and resets the timestamp (timestamp reset was originally missing in the
      netfilter bugfix).
      
      Fixes: bcfabee1 ("netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress")
      Reported-by: noreply@ellerman.id.au
      Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c64605b
  13. 29 2月, 2020 1 次提交
  14. 17 2月, 2020 1 次提交
    • R
      skbuff.h: fix all kernel-doc warnings · d2f273f0
      Randy Dunlap 提交于
      Fix all kernel-doc warnings in <linux/skbuff.h>.
      Fixes these warnings:
      
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'list' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'dev_scratch' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'ip_defrag_offset' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'skb_mstamp_ns' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member '__cloned_offset' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'head_frag' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member '__pkt_type_offset' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'encapsulation' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'encap_hdr_csum' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'csum_valid' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member '__pkt_vlan_present_offset' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'vlan_present' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'csum_complete_sw' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'csum_level' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'inner_protocol_type' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'remcsum_offload' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'sender_cpu' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'reserved_tailroom' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'inner_ipproto' not described in 'sk_buff'
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2f273f0
  15. 06 2月, 2020 1 次提交
    • Q
      skbuff: fix a data race in skb_queue_len() · 86b18aaa
      Qian Cai 提交于
      sk_buff.qlen can be accessed concurrently as noticed by KCSAN,
      
       BUG: KCSAN: data-race in __skb_try_recv_from_queue / unix_dgram_sendmsg
      
       read to 0xffff8a1b1d8a81c0 of 4 bytes by task 5371 on cpu 96:
        unix_dgram_sendmsg+0x9a9/0xb70 include/linux/skbuff.h:1821
      				 net/unix/af_unix.c:1761
        ____sys_sendmsg+0x33e/0x370
        ___sys_sendmsg+0xa6/0xf0
        __sys_sendmsg+0x69/0xf0
        __x64_sys_sendmsg+0x51/0x70
        do_syscall_64+0x91/0xb47
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
       write to 0xffff8a1b1d8a81c0 of 4 bytes by task 1 on cpu 99:
        __skb_try_recv_from_queue+0x327/0x410 include/linux/skbuff.h:2029
        __skb_try_recv_datagram+0xbe/0x220
        unix_dgram_recvmsg+0xee/0x850
        ____sys_recvmsg+0x1fb/0x210
        ___sys_recvmsg+0xa2/0xf0
        __sys_recvmsg+0x66/0xf0
        __x64_sys_recvmsg+0x51/0x70
        do_syscall_64+0x91/0xb47
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Since only the read is operating as lockless, it could introduce a logic
      bug in unix_recvq_full() due to the load tearing. Fix it by adding
      a lockless variant of skb_queue_len() and unix_recvq_full() where
      READ_ONCE() is on the read while WRITE_ONCE() is on the write similar to
      the commit d7d16a89 ("net: add skb_queue_empty_lockless()").
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86b18aaa
  16. 27 1月, 2020 2 次提交
  17. 15 1月, 2020 1 次提交
  18. 10 1月, 2020 2 次提交
  19. 09 1月, 2020 1 次提交
    • J
      net: introduce skb_list_walk_safe for skb segment walking · dcfea72e
      Jason A. Donenfeld 提交于
      As part of the continual effort to remove direct usage of skb->next and
      skb->prev, this patch adds a helper for iterating through the
      singly-linked variant of skb lists, which are used for lists of GSO
      packet. The name "skb_list_..." has been chosen to match the existing
      function, "kfree_skb_list, which also operates on these singly-linked
      lists, and the "..._walk_safe" part is the same idiom as elsewhere in
      the kernel.
      
      This patch removes the helper from wireguard and puts it into
      linux/skbuff.h, while making it a bit more robust for general usage. In
      particular, parenthesis are added around the macro argument usage, and it
      now accounts for trying to iterate through an already-null skb pointer,
      which will simply run the iteration zero times. This latter enhancement
      means it can be used to replace both do { ... } while and while (...)
      open-coded idioms.
      
      This should take care of these three possible usages, which match all
      current methods of iterations.
      
      skb_list_walk_safe(segs, skb, next) { ... }
      skb_list_walk_safe(skb, skb, next) { ... }
      skb_list_walk_safe(segs, skb, segs) { ... }
      
      Gcc appears to generate efficient code for each of these.
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dcfea72e
  20. 09 12月, 2019 1 次提交
  21. 05 12月, 2019 1 次提交
    • M
      net: Fixed updating of ethertype in skb_mpls_push() · d04ac224
      Martin Varghese 提交于
      The skb_mpls_push was not updating ethertype of an ethernet packet if
      the packet was originally received from a non ARPHRD_ETHER device.
      
      In the below OVS data path flow, since the device corresponding to
      port 7 is an l3 device (ARPHRD_NONE) the skb_mpls_push function does
      not update the ethertype of the packet even though the previous
      push_eth action had added an ethernet header to the packet.
      
      recirc_id(0),in_port(7),eth_type(0x0800),ipv4(tos=0/0xfc,ttl=64,frag=no),
      actions:push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00),
      push_mpls(label=13,tc=0,ttl=64,bos=1,eth_type=0x8847),4
      
      Fixes: 8822e270 ("net: core: move push MPLS functionality from OvS to core helper")
      Signed-off-by: NMartin Varghese <martin.varghese@nokia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d04ac224
  22. 03 12月, 2019 1 次提交
    • M
      Fixed updating of ethertype in function skb_mpls_pop · 040b5cfb
      Martin Varghese 提交于
      The skb_mpls_pop was not updating ethertype of an ethernet packet if the
      packet was originally received from a non ARPHRD_ETHER device.
      
      In the below OVS data path flow, since the device corresponding to port 7
      is an l3 device (ARPHRD_NONE) the skb_mpls_pop function does not update
      the ethertype of the packet even though the previous push_eth action had
      added an ethernet header to the packet.
      
      recirc_id(0),in_port(7),eth_type(0x8847),
      mpls(label=12/0xfffff,tc=0/0,ttl=0/0x0,bos=1/1),
      actions:push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00),
      pop_mpls(eth_type=0x800),4
      
      Fixes: ed246cee ("net: core: move pop MPLS functionality from OvS to core helper")
      Signed-off-by: NMartin Varghese <martin.varghese@nokia.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      040b5cfb
  23. 23 11月, 2019 1 次提交
  24. 15 11月, 2019 1 次提交
    • A
      y2038: socket: use __kernel_old_timespec instead of timespec · df1b4ba9
      Arnd Bergmann 提交于
      The 'timespec' type definition and helpers like ktime_to_timespec()
      or timespec64_to_timespec() should no longer be used in the kernel so
      we can remove them and avoid introducing y2038 issues in new code.
      
      Change the socket code that needs to pass a timespec to user space for
      backward compatibility to use __kernel_old_timespec instead.  This type
      has the same layout but with a clearer defined name.
      
      Slightly reformat tcp_recv_timestamp() for consistency after the removal
      of timespec64_to_timespec().
      Acked-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      df1b4ba9
  25. 08 11月, 2019 1 次提交
    • E
      net: add a READ_ONCE() in skb_peek_tail() · f8cc62ca
      Eric Dumazet 提交于
      skb_peek_tail() can be used without protection of a lock,
      as spotted by KCSAN [1]
      
      In order to avoid load-stearing, add a READ_ONCE()
      
      Note that the corresponding WRITE_ONCE() are already there.
      
      [1]
      BUG: KCSAN: data-race in sk_wait_data / skb_queue_tail
      
      read to 0xffff8880b36a4118 of 8 bytes by task 20426 on cpu 1:
       skb_peek_tail include/linux/skbuff.h:1784 [inline]
       sk_wait_data+0x15b/0x250 net/core/sock.c:2477
       kcm_wait_data+0x112/0x1f0 net/kcm/kcmsock.c:1103
       kcm_recvmsg+0xac/0x320 net/kcm/kcmsock.c:1130
       sock_recvmsg_nosec net/socket.c:871 [inline]
       sock_recvmsg net/socket.c:889 [inline]
       sock_recvmsg+0x92/0xb0 net/socket.c:885
       ___sys_recvmsg+0x1a0/0x3e0 net/socket.c:2480
       do_recvmmsg+0x19a/0x5c0 net/socket.c:2601
       __sys_recvmmsg+0x1ef/0x200 net/socket.c:2680
       __do_sys_recvmmsg net/socket.c:2703 [inline]
       __se_sys_recvmmsg net/socket.c:2696 [inline]
       __x64_sys_recvmmsg+0x89/0xb0 net/socket.c:2696
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      write to 0xffff8880b36a4118 of 8 bytes by task 451 on cpu 0:
       __skb_insert include/linux/skbuff.h:1852 [inline]
       __skb_queue_before include/linux/skbuff.h:1958 [inline]
       __skb_queue_tail include/linux/skbuff.h:1991 [inline]
       skb_queue_tail+0x7e/0xc0 net/core/skbuff.c:3145
       kcm_queue_rcv_skb+0x202/0x310 net/kcm/kcmsock.c:206
       kcm_rcv_strparser+0x74/0x4b0 net/kcm/kcmsock.c:370
       __strp_recv+0x348/0xf50 net/strparser/strparser.c:309
       strp_recv+0x84/0xa0 net/strparser/strparser.c:343
       tcp_read_sock+0x174/0x5c0 net/ipv4/tcp.c:1639
       strp_read_sock+0xd4/0x140 net/strparser/strparser.c:366
       do_strp_work net/strparser/strparser.c:414 [inline]
       strp_work+0x9a/0xe0 net/strparser/strparser.c:423
       process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
       worker_thread+0xa0/0x800 kernel/workqueue.c:2415
       kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 451 Comm: kworker/u4:3 Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: kstrp strp_work
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8cc62ca
  26. 29 10月, 2019 1 次提交
    • E
      net: add skb_queue_empty_lockless() · d7d16a89
      Eric Dumazet 提交于
      Some paths call skb_queue_empty() without holding
      the queue lock. We must use a barrier in order
      to not let the compiler do strange things, and avoid
      KCSAN splats.
      
      Adding a barrier in skb_queue_empty() might be overkill,
      I prefer adding a new helper to clearly identify
      points where the callers might be lockless. This might
      help us finding real bugs.
      
      The corresponding WRITE_ONCE() should add zero cost
      for current compilers.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7d16a89
  27. 24 10月, 2019 1 次提交
    • E
      net/flow_dissector: switch to siphash · 55667441
      Eric Dumazet 提交于
      UDP IPv6 packets auto flowlabels are using a 32bit secret
      (static u32 hashrnd in net/core/flow_dissector.c) and
      apply jhash() over fields known by the receivers.
      
      Attackers can easily infer the 32bit secret and use this information
      to identify a device and/or user, since this 32bit secret is only
      set at boot time.
      
      Really, using jhash() to generate cookies sent on the wire
      is a serious security concern.
      
      Trying to change the rol32(hash, 16) in ip6_make_flowlabel() would be
      a dead end. Trying to periodically change the secret (like in sch_sfq.c)
      could change paths taken in the network for long lived flows.
      
      Let's switch to siphash, as we did in commit df453700
      ("inet: switch IP ID generator to siphash")
      
      Using a cryptographically strong pseudo random function will solve this
      privacy issue and more generally remove other weak points in the stack.
      
      Packet schedulers using skb_get_hash_perturb() benefit from this change.
      
      Fixes: b5677416 ("ipv6: Enable auto flow labels by default")
      Fixes: 42240901 ("ipv6: Implement different admin modes for automatic flow labels")
      Fixes: 67800f9b ("ipv6: Call skb_get_hash_flowi6 to get skb->hash in ip6_make_flowlabel")
      Fixes: cb1ce2ef ("ipv6: Implement automatic flow label generation on transmit")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NJonathan Berger <jonathann1@walla.com>
      Reported-by: NAmit Klein <aksecurity@gmail.com>
      Reported-by: NBenny Pinkas <benny@pinkas.net>
      Cc: Tom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55667441
  28. 16 10月, 2019 1 次提交
    • D
      net/sched: fix corrupted L2 header with MPLS 'push' and 'pop' actions · fa4e0f88
      Davide Caratti 提交于
      the following script:
      
       # tc qdisc add dev eth0 clsact
       # tc filter add dev eth0 egress protocol ip matchall \
       > action mpls push protocol mpls_uc label 0x355aa bos 1
      
      causes corruption of all IP packets transmitted by eth0. On TC egress, we
      can't rely on the value of skb->mac_len, because it's 0 and a MPLS 'push'
      operation will result in an overwrite of the first 4 octets in the packet
      L2 header (e.g. the Destination Address if eth0 is an Ethernet); the same
      error pattern is present also in the MPLS 'pop' operation. Fix this error
      in act_mpls data plane, computing 'mac_len' as the difference between the
      network header and the mac header (when not at TC ingress), and use it in
      MPLS 'push'/'pop' core functions.
      
      v2: unbreak 'make htmldocs' because of missing documentation of 'mac_len'
          in skb_mpls_pop(), reported by kbuild test robot
      
      CC: Lorenzo Bianconi <lorenzo@kernel.org>
      Fixes: 2a2ea508 ("net: sched: add mpls manipulation actions to TC")
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Acked-by: NJohn Hurley <john.hurley@netronome.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa4e0f88
  29. 07 10月, 2019 1 次提交
  30. 02 10月, 2019 1 次提交
    • F
      netfilter: drop bridge nf reset from nf_reset · 895b5c9f
      Florian Westphal 提交于
      commit 174e2381
      ("sk_buff: drop all skb extensions on free and skb scrubbing") made napi
      recycle always drop skb extensions.  The additional skb_ext_del() that is
      performed via nf_reset on napi skb recycle is not needed anymore.
      
      Most nf_reset() calls in the stack are there so queued skb won't block
      'rmmod nf_conntrack' indefinitely.
      
      This removes the skb_ext_del from nf_reset, and renames it to a more
      fitting nf_reset_ct().
      
      In a few selected places, add a call to skb_ext_reset to make sure that
      no active extensions remain.
      
      I am submitting this for "net", because we're still early in the release
      cycle.  The patch applies to net-next too, but I think the rename causes
      needless divergence between those trees.
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      895b5c9f
  31. 28 9月, 2019 1 次提交
    • F
      sk_buff: drop all skb extensions on free and skb scrubbing · 174e2381
      Florian Westphal 提交于
      Now that we have a 3rd extension, add a new helper that drops the
      extension space and use it when we need to scrub an sk_buff.
      
      At this time, scrubbing clears secpath and bridge netfilter data, but
      retains the tc skb extension, after this patch all three get cleared.
      
      NAPI reuse/free assumes we can only have a secpath attached to skb, but
      it seems better to clear all extensions there as well.
      
      v2: add unlikely hint (Eric Dumazet)
      
      Fixes: 95a7233c ("net: openvswitch: Set OvS recirc_id from tc chain index")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      174e2381
  32. 13 9月, 2019 1 次提交
  33. 06 9月, 2019 1 次提交
    • P
      net: openvswitch: Set OvS recirc_id from tc chain index · 95a7233c
      Paul Blakey 提交于
      Offloaded OvS datapath rules are translated one to one to tc rules,
      for example the following simplified OvS rule:
      
      recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) actions:ct(),recirc(2)
      
      Will be translated to the following tc rule:
      
      $ tc filter add dev dev1 ingress \
      	    prio 1 chain 0 proto ip \
      		flower tcp ct_state -trk \
      		action ct pipe \
      		action goto chain 2
      
      Received packets will first travel though tc, and if they aren't stolen
      by it, like in the above rule, they will continue to OvS datapath.
      Since we already did some actions (action ct in this case) which might
      modify the packets, and updated action stats, we would like to continue
      the proccessing with the correct recirc_id in OvS (here recirc_id(2))
      where we left off.
      
      To support this, introduce a new skb extension for tc, which
      will be used for translating tc chain to ovs recirc_id to
      handle these miss cases. Last tc chain index will be set
      by tc goto chain action and read by OvS datapath.
      Signed-off-by: NPaul Blakey <paulb@mellanox.com>
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95a7233c
  34. 09 8月, 2019 1 次提交
    • J
      net/tls: prevent skb_orphan() from leaking TLS plain text with offload · 41477662
      Jakub Kicinski 提交于
      sk_validate_xmit_skb() and drivers depend on the sk member of
      struct sk_buff to identify segments requiring encryption.
      Any operation which removes or does not preserve the original TLS
      socket such as skb_orphan() or skb_clone() will cause clear text
      leaks.
      
      Make the TCP socket underlying an offloaded TLS connection
      mark all skbs as decrypted, if TLS TX is in offload mode.
      Then in sk_validate_xmit_skb() catch skbs which have no socket
      (or a socket with no validation) and decrypted flag set.
      
      Note that CONFIG_SOCK_VALIDATE_XMIT, CONFIG_TLS_DEVICE and
      sk->sk_validate_xmit_skb are slightly interchangeable right now,
      they all imply TLS offload. The new checks are guarded by
      CONFIG_TLS_DEVICE because that's the option guarding the
      sk_buff->decrypted member.
      
      Second, smaller issue with orphaning is that it breaks
      the guarantee that packets will be delivered to device
      queues in-order. All TLS offload drivers depend on that
      scheduling property. This means skb_orphan_partial()'s
      trick of preserving partial socket references will cause
      issues in the drivers. We need a full orphan, and as a
      result netem delay/throttling will cause all TLS offload
      skbs to be dropped.
      
      Reusing the sk_buff->decrypted flag also protects from
      leaking clear text when incoming, decrypted skb is redirected
      (e.g. by TC).
      
      See commit 0608c69c ("bpf: sk_msg, sock{map|hash} redirect
      through ULP") for justification why the internal flag is safe.
      The only location which could leak the flag in is tcp_bpf_sendmsg(),
      which is taken care of by clearing the previously unused bit.
      
      v2:
       - remove superfluous decrypted mark copy (Willem);
       - remove the stale doc entry (Boris);
       - rely entirely on EOR marking to prevent coalescing (Boris);
       - use an internal sendpages flag instead of marking the socket
         (Boris).
      v3 (Willem):
       - reorganize the can_skb_orphan_partial() condition;
       - fix the flag leak-in through tcp_bpf_sendmsg.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Reviewed-by: NBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41477662
  35. 31 7月, 2019 2 次提交
  36. 26 7月, 2019 1 次提交
  37. 23 7月, 2019 1 次提交