1. 16 10月, 2019 1 次提交
    • D
      net/sched: fix corrupted L2 header with MPLS 'push' and 'pop' actions · fa4e0f88
      Davide Caratti 提交于
      the following script:
      
       # tc qdisc add dev eth0 clsact
       # tc filter add dev eth0 egress protocol ip matchall \
       > action mpls push protocol mpls_uc label 0x355aa bos 1
      
      causes corruption of all IP packets transmitted by eth0. On TC egress, we
      can't rely on the value of skb->mac_len, because it's 0 and a MPLS 'push'
      operation will result in an overwrite of the first 4 octets in the packet
      L2 header (e.g. the Destination Address if eth0 is an Ethernet); the same
      error pattern is present also in the MPLS 'pop' operation. Fix this error
      in act_mpls data plane, computing 'mac_len' as the difference between the
      network header and the mac header (when not at TC ingress), and use it in
      MPLS 'push'/'pop' core functions.
      
      v2: unbreak 'make htmldocs' because of missing documentation of 'mac_len'
          in skb_mpls_pop(), reported by kbuild test robot
      
      CC: Lorenzo Bianconi <lorenzo@kernel.org>
      Fixes: 2a2ea508 ("net: sched: add mpls manipulation actions to TC")
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Acked-by: NJohn Hurley <john.hurley@netronome.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa4e0f88
  2. 02 10月, 2019 1 次提交
    • F
      netfilter: drop bridge nf reset from nf_reset · 895b5c9f
      Florian Westphal 提交于
      commit 174e2381
      ("sk_buff: drop all skb extensions on free and skb scrubbing") made napi
      recycle always drop skb extensions.  The additional skb_ext_del() that is
      performed via nf_reset on napi skb recycle is not needed anymore.
      
      Most nf_reset() calls in the stack are there so queued skb won't block
      'rmmod nf_conntrack' indefinitely.
      
      This removes the skb_ext_del from nf_reset, and renames it to a more
      fitting nf_reset_ct().
      
      In a few selected places, add a call to skb_ext_reset to make sure that
      no active extensions remain.
      
      I am submitting this for "net", because we're still early in the release
      cycle.  The patch applies to net-next too, but I think the rename causes
      needless divergence between those trees.
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      895b5c9f
  3. 28 9月, 2019 1 次提交
    • F
      sk_buff: drop all skb extensions on free and skb scrubbing · 174e2381
      Florian Westphal 提交于
      Now that we have a 3rd extension, add a new helper that drops the
      extension space and use it when we need to scrub an sk_buff.
      
      At this time, scrubbing clears secpath and bridge netfilter data, but
      retains the tc skb extension, after this patch all three get cleared.
      
      NAPI reuse/free assumes we can only have a secpath attached to skb, but
      it seems better to clear all extensions there as well.
      
      v2: add unlikely hint (Eric Dumazet)
      
      Fixes: 95a7233c ("net: openvswitch: Set OvS recirc_id from tc chain index")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      174e2381
  4. 13 9月, 2019 1 次提交
  5. 06 9月, 2019 1 次提交
    • P
      net: openvswitch: Set OvS recirc_id from tc chain index · 95a7233c
      Paul Blakey 提交于
      Offloaded OvS datapath rules are translated one to one to tc rules,
      for example the following simplified OvS rule:
      
      recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) actions:ct(),recirc(2)
      
      Will be translated to the following tc rule:
      
      $ tc filter add dev dev1 ingress \
      	    prio 1 chain 0 proto ip \
      		flower tcp ct_state -trk \
      		action ct pipe \
      		action goto chain 2
      
      Received packets will first travel though tc, and if they aren't stolen
      by it, like in the above rule, they will continue to OvS datapath.
      Since we already did some actions (action ct in this case) which might
      modify the packets, and updated action stats, we would like to continue
      the proccessing with the correct recirc_id in OvS (here recirc_id(2))
      where we left off.
      
      To support this, introduce a new skb extension for tc, which
      will be used for translating tc chain to ovs recirc_id to
      handle these miss cases. Last tc chain index will be set
      by tc goto chain action and read by OvS datapath.
      Signed-off-by: NPaul Blakey <paulb@mellanox.com>
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95a7233c
  6. 09 8月, 2019 1 次提交
    • J
      net/tls: prevent skb_orphan() from leaking TLS plain text with offload · 41477662
      Jakub Kicinski 提交于
      sk_validate_xmit_skb() and drivers depend on the sk member of
      struct sk_buff to identify segments requiring encryption.
      Any operation which removes or does not preserve the original TLS
      socket such as skb_orphan() or skb_clone() will cause clear text
      leaks.
      
      Make the TCP socket underlying an offloaded TLS connection
      mark all skbs as decrypted, if TLS TX is in offload mode.
      Then in sk_validate_xmit_skb() catch skbs which have no socket
      (or a socket with no validation) and decrypted flag set.
      
      Note that CONFIG_SOCK_VALIDATE_XMIT, CONFIG_TLS_DEVICE and
      sk->sk_validate_xmit_skb are slightly interchangeable right now,
      they all imply TLS offload. The new checks are guarded by
      CONFIG_TLS_DEVICE because that's the option guarding the
      sk_buff->decrypted member.
      
      Second, smaller issue with orphaning is that it breaks
      the guarantee that packets will be delivered to device
      queues in-order. All TLS offload drivers depend on that
      scheduling property. This means skb_orphan_partial()'s
      trick of preserving partial socket references will cause
      issues in the drivers. We need a full orphan, and as a
      result netem delay/throttling will cause all TLS offload
      skbs to be dropped.
      
      Reusing the sk_buff->decrypted flag also protects from
      leaking clear text when incoming, decrypted skb is redirected
      (e.g. by TC).
      
      See commit 0608c69c ("bpf: sk_msg, sock{map|hash} redirect
      through ULP") for justification why the internal flag is safe.
      The only location which could leak the flag in is tcp_bpf_sendmsg(),
      which is taken care of by clearing the previously unused bit.
      
      v2:
       - remove superfluous decrypted mark copy (Willem);
       - remove the stale doc entry (Boris);
       - rely entirely on EOR marking to prevent coalescing (Boris);
       - use an internal sendpages flag instead of marking the socket
         (Boris).
      v3 (Willem):
       - reorganize the can_skb_orphan_partial() condition;
       - fix the flag leak-in through tcp_bpf_sendmsg.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Reviewed-by: NBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41477662
  7. 31 7月, 2019 2 次提交
  8. 26 7月, 2019 1 次提交
  9. 23 7月, 2019 6 次提交
  10. 10 7月, 2019 1 次提交
  11. 09 7月, 2019 5 次提交
  12. 06 7月, 2019 1 次提交
  13. 19 6月, 2019 1 次提交
  14. 05 6月, 2019 1 次提交
  15. 31 5月, 2019 1 次提交
  16. 17 5月, 2019 1 次提交
    • W
      net: test nouarg before dereferencing zerocopy pointers · 185ce5c3
      Willem de Bruijn 提交于
      Zerocopy skbs without completion notification were added for packet
      sockets with PACKET_TX_RING user buffers. Those signal completion
      through the TP_STATUS_USER bit in the ring. Zerocopy annotation was
      added only to avoid premature notification after clone or orphan, by
      triggering a copy on these paths for these packets.
      
      The mechanism had to define a special "no-uarg" mode because packet
      sockets already use skb_uarg(skb) == skb_shinfo(skb)->destructor_arg
      for a different pointer.
      
      Before deferencing skb_uarg(skb), verify that it is a real pointer.
      
      Fixes: 5cd8d46e ("packet: copy user buffers before orphan or clone")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      185ce5c3
  17. 26 4月, 2019 1 次提交
  18. 24 4月, 2019 3 次提交
  19. 19 4月, 2019 1 次提交
  20. 18 4月, 2019 1 次提交
  21. 09 4月, 2019 1 次提交
  22. 02 4月, 2019 1 次提交
  23. 09 3月, 2019 1 次提交
  24. 07 3月, 2019 1 次提交
  25. 23 2月, 2019 1 次提交
    • M
      net: Don't set transport offset to invalid value · d2aa125d
      Maxim Mikityanskiy 提交于
      If the socket was created with socket(AF_PACKET, SOCK_RAW, 0),
      skb->protocol will be unset, __skb_flow_dissect() will fail, and
      skb_probe_transport_header() will fall back to the offset_hint, making
      the resulting skb_transport_offset incorrect.
      
      If, however, there is no transport header in the packet,
      transport_header shouldn't be set to an arbitrary value.
      
      Fix it by leaving the transport offset unset if it couldn't be found, to
      be explicit rather than to fill it with some wrong value. It changes the
      behavior, but if some code relied on the old behavior, it would be
      broken anyway, as the old one is incorrect.
      Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2aa125d
  26. 16 2月, 2019 1 次提交
  27. 13 2月, 2019 1 次提交
    • B
      net/skbuff: fix up kernel-doc placement · 4ea7b0cf
      Brian Norris 提交于
      There are several skb_* functions where the locked and unlocked
      functions are confusingly documented. For several of them, the
      kernel-doc for the unlocked version is placed above the locked version,
      which to the casual reader makes it seems like the locked version "takes
      no locks and you must therefore hold required locks before calling it."
      
      One can see, for example, that this link claims to document
      skb_queue_head(), while instead describing __skb_queue_head().
      
      https://www.kernel.org/doc/html/latest/networking/kapi.html#c.skb_queue_head
      
      The correct documentation for skb_queue_head() is also included further
      down the page.
      
      This diff tested via:
      
        $ scripts/kernel-doc -rst include/linux/skbuff.h net/core/skbuff.c
      
      No new warnings were seen, and the output makes a little more sense.
      Signed-off-by: NBrian Norris <briannorris@chromium.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ea7b0cf
  28. 11 2月, 2019 1 次提交
    • W
      bpf: only adjust gso_size on bytestream protocols · b90efd22
      Willem de Bruijn 提交于
      bpf_skb_change_proto and bpf_skb_adjust_room change skb header length.
      For GSO packets they adjust gso_size to maintain the same MTU.
      
      The gso size can only be safely adjusted on bytestream protocols.
      Commit d02f51cb ("bpf: fix bpf_skb_adjust_net/bpf_skb_proto_xlat
      to deal with gso sctp skbs") excluded SKB_GSO_SCTP.
      
      Since then type SKB_GSO_UDP_L4 has been added, whose contents are one
      gso_size unit per datagram. Also exclude these.
      
      Move from a blacklist to a whitelist check to future proof against
      additional such new GSO types, e.g., for fraglist based GRO.
      
      Fixes: bec1f6f6 ("udp: generate gso with UDP_SEGMENT")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      b90efd22