1. 02 5月, 2018 2 次提交
  2. 01 5月, 2018 8 次提交
    • I
      net/tls: Add generic NIC offload infrastructure · e8f69799
      Ilya Lesokhin 提交于
      This patch adds a generic infrastructure to offload TLS crypto to a
      network device. It enables the kernel TLS socket to skip encryption
      and authentication operations on the transmit side of the data path.
      Leaving those computationally expensive operations to the NIC.
      
      The NIC offload infrastructure builds TLS records and pushes them to
      the TCP layer just like the SW KTLS implementation and using the same
      API.
      TCP segmentation is mostly unaffected. Currently the only exception is
      that we prevent mixed SKBs where only part of the payload requires
      offload. In the future we are likely to add a similar restriction
      following a change cipher spec record.
      
      The notable differences between SW KTLS and NIC offloaded TLS
      implementations are as follows:
      1. The offloaded implementation builds "plaintext TLS record", those
      records contain plaintext instead of ciphertext and place holder bytes
      instead of authentication tags.
      2. The offloaded implementation maintains a mapping from TCP sequence
      number to TLS records. Thus given a TCP SKB sent from a NIC offloaded
      TLS socket, we can use the tls NIC offload infrastructure to obtain
      enough context to encrypt the payload of the SKB.
      A TLS record is released when the last byte of the record is ack'ed,
      this is done through the new icsk_clean_acked callback.
      
      The infrastructure should be extendable to support various NIC offload
      implementations.  However it is currently written with the
      implementation below in mind:
      The NIC assumes that packets from each offloaded stream are sent as
      plaintext and in-order. It keeps track of the TLS records in the TCP
      stream. When a packet marked for offload is transmitted, the NIC
      encrypts the payload in-place and puts authentication tags in the
      relevant place holders.
      
      The responsibility for handling out-of-order packets (i.e. TCP
      retransmission, qdisc drops) falls on the netdev driver.
      
      The netdev driver keeps track of the expected TCP SN from the NIC's
      perspective.  If the next packet to transmit matches the expected TCP
      SN, the driver advances the expected TCP SN, and transmits the packet
      with TLS offload indication.
      
      If the next packet to transmit does not match the expected TCP SN. The
      driver calls the TLS layer to obtain the TLS record that includes the
      TCP of the packet for transmission. Using this TLS record, the driver
      posts a work entry on the transmit queue to reconstruct the NIC TLS
      state required for the offload of the out-of-order packet. It updates
      the expected TCP SN accordingly and transmits the now in-order packet.
      The same queue is used for packet transmission and TLS context
      reconstruction to avoid the need for flushing the transmit queue before
      issuing the context reconstruction request.
      Signed-off-by: NIlya Lesokhin <ilyal@mellanox.com>
      Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: NAviad Yehezkel <aviadye@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e8f69799
    • B
      net/tls: Split conf to rx + tx · f66de3ee
      Boris Pismenny 提交于
      In TLS inline crypto, we can have one direction in software
      and another in hardware. Thus, we split the TLS configuration to separate
      structures for receive and transmit.
      Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f66de3ee
    • I
      net: Add TLS TX offload features · 2342a851
      Ilya Lesokhin 提交于
      This patch adds a netdev feature to configure TLS TX offloads.
      Signed-off-by: NIlya Lesokhin <ilyal@mellanox.com>
      Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: NAviad Yehezkel <aviadye@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2342a851
    • I
      net: Add Software fallback infrastructure for socket dependent offloads · ebf4e808
      Ilya Lesokhin 提交于
      With socket dependent offloads we rely on the netdev to transform
      the transmitted packets before sending them to the wire.
      When a packet from an offloaded socket is rerouted to a different
      device we need to detect it and do the transformation in software.
      Signed-off-by: NIlya Lesokhin <ilyal@mellanox.com>
      Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ebf4e808
    • I
      net: Rename and export copy_skb_header · 08303c18
      Ilya Lesokhin 提交于
      copy_skb_header is renamed to skb_copy_header and
      exported. Exposing this function give more flexibility
      in copying SKBs.
      skb_copy and skb_copy_expand do not give enough control
      over which parts are copied.
      Signed-off-by: NIlya Lesokhin <ilyal@mellanox.com>
      Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      08303c18
    • I
      tcp: Add clean acked data hook · 6dac1523
      Ilya Lesokhin 提交于
      Called when a TCP segment is acknowledged.
      Could be used by application protocols who hold additional
      metadata associated with the stream data.
      
      This is required by TLS device offload to release
      metadata associated with acknowledged TLS records.
      Signed-off-by: NIlya Lesokhin <ilyal@mellanox.com>
      Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: NAviad Yehezkel <aviadye@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6dac1523
    • P
      net: bridge: Publish bridge accessor functions · 4d4fd361
      Petr Machata 提交于
      Add a couple new functions to allow querying FDB and vlan settings of a
      bridge.
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4d4fd361
    • A
      ipv6: sr: extract the right key values for "seg6_make_flowlabel" · 6df93462
      Ahmed Abdelsalam 提交于
      The seg6_make_flowlabel() is used by seg6_do_srh_encap() to compute the
      flowlabel from a given skb. It relies on skb_get_hash() which eventually
      calls __skb_flow_dissect() to extract the flow_keys struct values from
      the skb.
      
      In case of IPv4 traffic, calling seg6_make_flowlabel() after skb_push(),
      skb_reset_network_header(), and skb_mac_header_rebuild() will results in
      flow_keys struct of all key values set to zero.
      
      This patch calls seg6_make_flowlabel() before resetting the headers of skb
      to get the right key values.
      
      Extracted Key values are based on the type inner packet as follows:
      1) IPv6 traffic: src_IP, dst_IP, L4 proto, and flowlabel of inner packet.
      2) IPv4 traffic: src_IP, dst_IP, L4 proto, src_port, and dst_port
      3) L2 traffic: depends on what kind of traffic carried into the L2
      frame. IPv6 and IPv4 traffic works as discussed 1) and 2)
      
      Here a hex_dump of struct flow_keys for IPv4 and IPv6 traffic
      10.100.1.100: 47302 > 30.0.0.2: 5001
      00000000: 14 00 02 00 00 00 00 00 08 00 11 00 00 00 00 00
      00000010: 00 00 00 00 00 00 00 00 13 89 b8 c6 1e 00 00 02
      00000020: 0a 64 01 64
      
      fc00:a1:a > b2::2
      00000000: 28 00 03 00 00 00 00 00 86 dd 11 00 99 f9 02 00
      00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 b2 00 00
      00000020: 00 00 00 00 00 00 00 00 00 00 00 02 fc 00 00 a1
      00000030: 00 00 00 00 00 00 00 00 00 00 00 0a
      Signed-off-by: NAhmed Abdelsalam <amsalam20@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6df93462
  3. 30 4月, 2018 4 次提交
    • W
      erspan: auto detect truncated packets. · 1baf5ebf
      William Tu 提交于
      Currently the truncated bit is set only when the mirrored packet
      is larger than mtu.  For certain cases, the packet might already
      been truncated before sending to the erspan tunnel.  In this case,
      the patch detect whether the IP header's total length is larger
      than the actual skb->len.  If true, this indicated that the
      mirrored packet is truncated and set the erspan truncate bit.
      
      I tested the patch using bpf_skb_change_tail helper function to
      shrink the packet size and send to erspan tunnel.
      Reported-by: NXiaoyan Jin <xiaoyanj@vmware.com>
      Signed-off-by: NWilliam Tu <u9012063@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1baf5ebf
    • F
      net: core: Assert the size of netdev_featres_t · 3ac305c3
      Florian Fainelli 提交于
      We have about 53 netdev_features_t bits defined and counting, add a
      build time check to catch when an u64 type will not be enough and we
      will have to convert that to a bitmap. This is done in
      register_netdevice() for convenience.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3ac305c3
    • A
      net: Revoke export for __skb_tx_hash, update it to just be static skb_tx_hash · 1b837d48
      Alexander Duyck 提交于
      I am dropping the export of __skb_tx_hash as after my patches nobody is
      using it outside of the net/core/dev.c file. In addition I am renaming and
      repurposing it to just be a static declaration of skb_tx_hash since that
      was the only user for it at this point. By doing this the compiler can
      inline it into __netdev_pick_tx as that will improve performance.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b837d48
    • E
      tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive · 05255b82
      Eric Dumazet 提交于
      When adding tcp mmap() implementation, I forgot that socket lock
      had to be taken before current->mm->mmap_sem. syzbot eventually caught
      the bug.
      
      Since we can not lock the socket in tcp mmap() handler we have to
      split the operation in two phases.
      
      1) mmap() on a tcp socket simply reserves VMA space, and nothing else.
        This operation does not involve any TCP locking.
      
      2) getsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) implements
       the transfert of pages from skbs to one VMA.
        This operation only uses down_read(&current->mm->mmap_sem) after
        holding TCP lock, thus solving the lockdep issue.
      
      This new implementation was suggested by Andy Lutomirski with great details.
      
      Benefits are :
      
      - Better scalability, in case multiple threads reuse VMAS
         (without mmap()/munmap() calls) since mmap_sem wont be write locked.
      
      - Better error recovery.
         The previous mmap() model had to provide the expected size of the
         mapping. If for some reason one part could not be mapped (partial MSS),
         the whole operation had to be aborted.
         With the tcp_zerocopy_receive struct, kernel can report how
         many bytes were successfuly mapped, and how many bytes should
         be read to skip the problematic sequence.
      
      - No more memory allocation to hold an array of page pointers.
        16 MB mappings needed 32 KB for this array, potentially using vmalloc() :/
      
      - skbs are freed while mmap_sem has been released
      
      Following patch makes the change in tcp_mmap tool to demonstrate
      one possible use of mmap() and setsockopt(... TCP_ZEROCOPY_RECEIVE ...)
      
      Note that memcg might require additional changes.
      
      Fixes: 93ab6cc6 ("tcp: implement mmap() for zero copy receive")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Suggested-by: NAndy Lutomirski <luto@kernel.org>
      Cc: linux-mm@kvack.org
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05255b82
  4. 28 4月, 2018 21 次提交
  5. 27 4月, 2018 5 次提交