1. 19 1月, 2021 2 次提交
    • M
      ipv6: set multicast flag on the multicast route · ceed9038
      Matteo Croce 提交于
      The multicast route ff00::/8 is created with type RTN_UNICAST:
      
        $ ip -6 -d route
        unicast ::1 dev lo proto kernel scope global metric 256 pref medium
        unicast fe80::/64 dev eth0 proto kernel scope global metric 256 pref medium
        unicast ff00::/8 dev eth0 proto kernel scope global metric 256 pref medium
      
      Set the type to RTN_MULTICAST which is more appropriate.
      
      Fixes: e8478e80 ("net/ipv6: Save route type in rt6_info")
      Signed-off-by: NMatteo Croce <mcroce@microsoft.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      ceed9038
    • M
      ipv6: create multicast route with RTPROT_KERNEL · a826b043
      Matteo Croce 提交于
      The ff00::/8 multicast route is created without specifying the fc_protocol
      field, so the default RTPROT_BOOT value is used:
      
        $ ip -6 -d route
        unicast ::1 dev lo proto kernel scope global metric 256 pref medium
        unicast fe80::/64 dev eth0 proto kernel scope global metric 256 pref medium
        unicast ff00::/8 dev eth0 proto boot scope global metric 256 pref medium
      
      As the documentation says, this value identifies routes installed during
      boot, but the route is created when interface is set up.
      Change the value to RTPROT_KERNEL which is a better value.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NMatteo Croce <mcroce@microsoft.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      a826b043
  2. 15 1月, 2021 1 次提交
  3. 12 1月, 2021 1 次提交
    • W
      esp: avoid unneeded kmap_atomic call · 9bd6b629
      Willem de Bruijn 提交于
      esp(6)_output_head uses skb_page_frag_refill to allocate a buffer for
      the esp trailer.
      
      It accesses the page with kmap_atomic to handle highmem. But
      skb_page_frag_refill can return compound pages, of which
      kmap_atomic only maps the first underlying page.
      
      skb_page_frag_refill does not return highmem, because flag
      __GFP_HIGHMEM is not set. ESP uses it in the same manner as TCP.
      That also does not call kmap_atomic, but directly uses page_address,
      in skb_copy_to_page_nocache. Do the same for ESP.
      
      This issue has become easier to trigger with recent kmap local
      debugging feature CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP.
      
      Fixes: cac2661c ("esp4: Avoid skb_cow_data whenever possible")
      Fixes: 03e2a30f ("esp6: Avoid skb_cow_data whenever possible")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      9bd6b629
  4. 10 1月, 2021 1 次提交
    • A
      net: ipv6: Validate GSO SKB before finish IPv6 processing · b210de4f
      Aya Levin 提交于
      There are cases where GSO segment's length exceeds the egress MTU:
       - Forwarding of a TCP GRO skb, when DF flag is not set.
       - Forwarding of an skb that arrived on a virtualisation interface
         (virtio-net/vhost/tap) with TSO/GSO size set by other network
         stack.
       - Local GSO skb transmitted on an NETIF_F_TSO tunnel stacked over an
         interface with a smaller MTU.
       - Arriving GRO skb (or GSO skb in a virtualised environment) that is
         bridged to a NETIF_F_TSO tunnel stacked over an interface with an
         insufficient MTU.
      
      If so:
       - Consume the SKB and its segments.
       - Issue an ICMP packet with 'Packet Too Big' message containing the
         MTU, allowing the source host to reduce its Path MTU appropriately.
      
      Note: These cases are handled in the same manner in IPv4 output finish.
      This patch aligns the behavior of IPv6 and the one of IPv4.
      
      Fixes: 9e508490 ("netfilter: ipv6: move POSTROUTING invocation before fragmentation")
      Signed-off-by: NAya Levin <ayal@nvidia.com>
      Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/1610027418-30438-1-git-send-email-ayal@nvidia.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      b210de4f
  5. 08 1月, 2021 1 次提交
    • S
      net: ipv6: fib: flush exceptions when purging route · d8f5c296
      Sean Tranchetti 提交于
      Route removal is handled by two code paths. The main removal path is via
      fib6_del_route() which will handle purging any PMTU exceptions from the
      cache, removing all per-cpu copies of the DST entry used by the route, and
      releasing the fib6_info struct.
      
      The second removal location is during fib6_add_rt2node() during a route
      replacement operation. This path also calls fib6_purge_rt() to handle
      cleaning up the per-cpu copies of the DST entries and releasing the
      fib6_info associated with the older route, but it does not flush any PMTU
      exceptions that the older route had. Since the older route is removed from
      the tree during the replacement, we lose any way of accessing it again.
      
      As these lingering DSTs and the fib6_info struct are holding references to
      the underlying netdevice struct as well, unregistering that device from the
      kernel can never complete.
      
      Fixes: 2b760fcf ("ipv6: hook up exception table to store dst cache")
      Signed-off-by: NSean Tranchetti <stranche@codeaurora.org>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/1609892546-11389-1-git-send-email-stranche@quicinc.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      d8f5c296
  6. 18 12月, 2020 1 次提交
  7. 10 12月, 2020 1 次提交
  8. 09 12月, 2020 1 次提交
  9. 08 12月, 2020 1 次提交
    • S
      netfilter: x_tables: Switch synchronization to RCU · cc00bcaa
      Subash Abhinov Kasiviswanathan 提交于
      When running concurrent iptables rules replacement with data, the per CPU
      sequence count is checked after the assignment of the new information.
      The sequence count is used to synchronize with the packet path without the
      use of any explicit locking. If there are any packets in the packet path using
      the table information, the sequence count is incremented to an odd value and
      is incremented to an even after the packet process completion.
      
      The new table value assignment is followed by a write memory barrier so every
      CPU should see the latest value. If the packet path has started with the old
      table information, the sequence counter will be odd and the iptables
      replacement will wait till the sequence count is even prior to freeing the
      old table info.
      
      However, this assumes that the new table information assignment and the memory
      barrier is actually executed prior to the counter check in the replacement
      thread. If CPU decides to execute the assignment later as there is no user of
      the table information prior to the sequence check, the packet path in another
      CPU may use the old table information. The replacement thread would then free
      the table information under it leading to a use after free in the packet
      processing context-
      
      Unable to handle kernel NULL pointer dereference at virtual
      address 000000000000008e
      pc : ip6t_do_table+0x5d0/0x89c
      lr : ip6t_do_table+0x5b8/0x89c
      ip6t_do_table+0x5d0/0x89c
      ip6table_filter_hook+0x24/0x30
      nf_hook_slow+0x84/0x120
      ip6_input+0x74/0xe0
      ip6_rcv_finish+0x7c/0x128
      ipv6_rcv+0xac/0xe4
      __netif_receive_skb+0x84/0x17c
      process_backlog+0x15c/0x1b8
      napi_poll+0x88/0x284
      net_rx_action+0xbc/0x23c
      __do_softirq+0x20c/0x48c
      
      This could be fixed by forcing instruction order after the new table
      information assignment or by switching to RCU for the synchronization.
      
      Fixes: 80055dab ("netfilter: x_tables: make xt_replace_table wait until old rules are not used anymore")
      Reported-by: NSean Tranchetti <stranche@codeaurora.org>
      Reported-by: Nkernel test robot <lkp@intel.com>
      Suggested-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      cc00bcaa
  10. 05 12月, 2020 5 次提交
    • A
      seg6: add VRF support for SRv6 End.DT6 behavior · 20a081b7
      Andrea Mayer 提交于
      SRv6 End.DT6 is defined in the SRv6 Network Programming [1].
      
      The Linux kernel already offers an implementation of the SRv6
      End.DT6 behavior which permits IPv6 L3 VPNs over SRv6 networks. This
      implementation is not particularly suitable in contexts where we need to
      deploy IPv6 L3 VPNs among different tenants which share the same network
      address schemes. The underlying problem lies in the fact that the
      current version of DT6 (called legacy DT6 from now on) needs a complex
      configuration to be applied on routers which requires ad-hoc routes and
      routing policy rules to ensure the correct isolation of tenants.
      
      Consequently, a new implementation of DT6 has been introduced with the
      aim of simplifying the construction of IPv6 L3 VPN services in the
      multi-tenant environment using SRv6 networks. To accomplish this task,
      we reused the same VRF infrastructure and SRv6 core components already
      exploited for implementing the SRv6 End.DT4 behavior.
      
      Currently the two End.DT6 implementations coexist seamlessly and can be
      used depending on the context and the user preferences. So, in order to
      support both versions of DT6 a new attribute (vrftable) has been
      introduced which allows us to differentiate the implementation of the
      behavior to be used.
      
      A SRv6 End.DT6 legacy behavior is still instantiated using a command
      like the following one:
      
       $ ip -6 route add 2001:db8::1 encap seg6local action End.DT6 table 100 dev eth0
      
      While to instantiate the SRv6 End.DT6 in VRF mode, the command is still
      pretty straight forward:
      
       $ ip -6 route add 2001:db8::1 encap seg6local action End.DT6 vrftable 100 dev eth0.
      
      Obviously as in the case of SRv6 End.DT4, the VRF strict_mode parameter
      must be set (net.vrf.strict_mode=1) and the VRF associated with table
      100 must exist.
      
      Please note that the instances of SRv6 End.DT6 legacy and End.DT6 VRF
      mode can coexist in the same system/configuration without problems.
      
      [1] https://tools.ietf.org/html/draft-ietf-spring-srv6-network-programmingSigned-off-by: NAndrea Mayer <andrea.mayer@uniroma2.it>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      20a081b7
    • A
      seg6: add support for the SRv6 End.DT4 behavior · 664d6f86
      Andrea Mayer 提交于
      SRv6 End.DT4 is defined in the SRv6 Network Programming [1].
      
      The SRv6 End.DT4 is used to implement IPv4 L3VPN use-cases in
      multi-tenants environments. It decapsulates the received packets and it
      performs IPv4 routing lookup in the routing table of the tenant.
      
      The SRv6 End.DT4 Linux implementation leverages a VRF device in order to
      force the routing lookup into the associated routing table.
      
      To make the End.DT4 work properly, it must be guaranteed that the routing
      table used for routing lookup operations is bound to one and only one
      VRF during the tunnel creation. Such constraint has to be enforced by
      enabling the VRF strict_mode sysctl parameter, i.e:
       $ sysctl -wq net.vrf.strict_mode=1.
      
      At JANOG44, LINE corporation presented their multi-tenant DC architecture
      using SRv6 [2]. In the slides, they reported that the Linux kernel is
      missing the support of SRv6 End.DT4 behavior.
      
      The SRv6 End.DT4 behavior can be instantiated using a command similar to
      the following:
      
       $ ip route add 2001:db8::1 encap seg6local action End.DT4 vrftable 100 dev eth0
      
      We introduce the "vrftable" extension in iproute2 in a following patch.
      
      [1] https://tools.ietf.org/html/draft-ietf-spring-srv6-network-programming
      [2] https://speakerdeck.com/line_developers/line-data-center-networking-with-srv6Signed-off-by: NAndrea Mayer <andrea.mayer@uniroma2.it>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      664d6f86
    • A
      seg6: add callbacks for customizing the creation/destruction of a behavior · cfdf64a0
      Andrea Mayer 提交于
      We introduce two callbacks used for customizing the creation/destruction of
      a SRv6 behavior. Such callbacks are defined in the new struct
      seg6_local_lwtunnel_ops and hereafter we provide a brief description of
      them:
      
       - build_state(...): used for calling the custom constructor of the
         behavior during its initialization phase and after all the attributes
         have been parsed successfully;
      
       - destroy_state(...): used for calling the custom destructor of the
         behavior before it is completely destroyed.
      Signed-off-by: NAndrea Mayer <andrea.mayer@uniroma2.it>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      cfdf64a0
    • A
      seg6: add support for optional attributes in SRv6 behaviors · 0a3021f1
      Andrea Mayer 提交于
      Before this patch, each SRv6 behavior specifies a set of required
      attributes that must be provided by the userspace application when such
      behavior is going to be instantiated. If at least one of the required
      attributes is not provided, the creation of the behavior fails.
      
      The SRv6 behavior framework lacks a way to manage optional attributes.
      By definition, an optional attribute for a SRv6 behavior consists of an
      attribute which may or may not be provided by the userspace. Therefore,
      if an optional attribute is missing (and thus not supplied by the user)
      the creation of the behavior goes ahead without any issue.
      
      This patch explicitly differentiates the required attributes from the
      optional attributes. In particular, each behavior can declare a set of
      required attributes and a set of optional ones.
      
      The semantic of the required attributes remains *totally* unaffected by
      this patch. The introduction of the optional attributes does NOT impact
      on the backward compatibility of the existing SRv6 behaviors.
      
      It is essential to note that if an (optional or required) attribute is
      supplied to a SRv6 behavior which does not expect it, the behavior
      simply discards such attribute without generating any error or warning.
      This operating mode remained unchanged both before and after the
      introduction of the optional attributes extension.
      
      The optional attributes are one of the key components used to implement
      the SRv6 End.DT6 behavior based on the Virtual Routing and Forwarding
      (VRF) framework. The optional attributes make possible the coexistence
      of the already existing SRv6 End.DT6 implementation with the new SRv6
      End.DT6 VRF-based implementation without breaking any backward
      compatibility. Further details on the SRv6 End.DT6 behavior (VRF mode)
      are reported in subsequent patches.
      
      From the userspace point of view, the support for optional attributes DO
      NOT require any changes to the userspace applications, i.e: iproute2
      unless new attributes (required or optional) are needed.
      Signed-off-by: NAndrea Mayer <andrea.mayer@uniroma2.it>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      0a3021f1
    • A
      seg6: improve management of behavior attributes · 964adce5
      Andrea Mayer 提交于
      Depending on the attribute (i.e.: SEG6_LOCAL_SRH, SEG6_LOCAL_TABLE, etc),
      the parse() callback performs some validity checks on the provided input
      and updates the tunnel state (slwt) with the result of the parsing
      operation. However, an attribute may also need to reserve some additional
      resources (i.e.: memory or setting up an eBPF program) in the parse()
      callback to complete the parsing operation.
      
      The parse() callbacks are invoked by the parse_nla_action() for each
      attribute belonging to a specific behavior. Given a behavior with N
      attributes, if the parsing of the i-th attribute fails, the
      parse_nla_action() returns immediately with an error. Nonetheless, the
      resources acquired during the parsing of the i-1 attributes are not freed
      by the parse_nla_action().
      
      Attributes which acquire resources must release them *in an explicit way*
      in both the seg6_local_{build/destroy}_state(). However, adding a new
      attribute of this type requires changes to
      seg6_local_{build/destroy}_state() to release the resources correctly.
      
      The seg6local infrastructure still lacks a simple and structured way to
      release the resources acquired in the parse() operations.
      
      We introduced a new callback in the struct seg6_action_param named
      destroy(). This callback releases any resource which may have been acquired
      in the parse() counterpart. Each attribute may or may not implement the
      destroy() callback depending on whether it needs to free some acquired
      resources.
      
      The destroy() callback comes with several of advantages:
      
       1) we can have many attributes as we want for a given behavior with no
          need to explicitly free the taken resources;
      
       2) As in case of the seg6_local_build_state(), the
          seg6_local_destroy_state() does not need to handle the release of
          resources directly. Indeed, it calls the destroy_attrs() function which
          is in charge of calling the destroy() callback for every set attribute.
          We do not need to patch seg6_local_{build/destroy}_state() anymore as
          we add new attributes;
      
       3) the code is more readable and better structured. Indeed, all the
          information needed to handle a given attribute are contained in only
          one place;
      
       4) it facilitates the integration with new features introduced in further
          patches.
      Signed-off-by: NAndrea Mayer <andrea.mayer@uniroma2.it>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      964adce5
  11. 04 12月, 2020 1 次提交
    • F
      tcp: merge 'init_req' and 'route_req' functions · 7ea851d1
      Florian Westphal 提交于
      The Multipath-TCP standard (RFC 8684) says that an MPTCP host should send
      a TCP reset if the token in a MP_JOIN request is unknown.
      
      At this time we don't do this, the 3whs completes and the 'new subflow'
      is reset afterwards.  There are two ways to allow MPTCP to send the
      reset.
      
      1. override 'send_synack' callback and emit the rst from there.
         The drawback is that the request socket gets inserted into the
         listeners queue just to get removed again right away.
      
      2. Send the reset from the 'route_req' function instead.
         This avoids the 'add&remove request socket', but route_req lacks the
         skb that is required to send the TCP reset.
      
      Instead of just adding the skb to that function for MPTCP sake alone,
      Paolo suggested to merge init_req and route_req functions.
      
      This saves one indirection from syn processing path and provides the skb
      to the merged function at the same time.
      
      'send reset on unknown mptcp join token' is added in next patch.
      Suggested-by: NPaolo Abeni <pabeni@redhat.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      7ea851d1
  12. 03 12月, 2020 2 次提交
  13. 02 12月, 2020 1 次提交
    • L
      net/ipv6: propagate user pointer annotation · 9e39394f
      Lukas Bulwahn 提交于
      For IPV6_2292PKTOPTIONS, do_ipv6_getsockopt() stores the user pointer
      optval in the msg_control field of the msghdr.
      
      Hence, sparse rightfully warns at ./net/ipv6/ipv6_sockglue.c:1151:33:
      
        warning: incorrect type in assignment (different address spaces)
            expected void *msg_control
            got char [noderef] __user *optval
      
      Since commit 1f466e1f ("net: cleanly handle kernel vs user buffers for
      ->msg_control"), user pointers shall be stored in the msg_control_user
      field, and kernel pointers in the msg_control field. This allows to
      propagate __user annotations nicely through this struct.
      
      Store optval in msg_control_user to properly record and propagate the
      memory space annotation of this pointer.
      
      Note that msg_control_is_user is set to true, so the key invariant, i.e.,
      use msg_control_user if and only if msg_control_is_user is true, holds.
      
      The msghdr is further used in the six alternative put_cmsg() calls, with
      msg_control_is_user being true, put_cmsg() picks msg_control_user
      preserving the __user annotation and passes that properly to
      copy_to_user().
      
      No functional change. No change in object code.
      Signed-off-by: NLukas Bulwahn <lukas.bulwahn@gmail.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20201127093421.21673-1-lukas.bulwahn@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      9e39394f
  14. 01 12月, 2020 1 次提交
  15. 26 11月, 2020 1 次提交
    • W
      ipv6: addrlabel: fix possible memory leak in ip6addrlbl_net_init · e255e11e
      Wang Hai 提交于
      kmemleak report a memory leak as follows:
      
      unreferenced object 0xffff8880059c6a00 (size 64):
        comm "ip", pid 23696, jiffies 4296590183 (age 1755.384s)
        hex dump (first 32 bytes):
          20 01 00 10 00 00 00 00 00 00 00 00 00 00 00 00   ...............
          1c 00 00 00 00 00 00 00 00 00 00 00 07 00 00 00  ................
        backtrace:
          [<00000000aa4e7a87>] ip6addrlbl_add+0x90/0xbb0
          [<0000000070b8d7f1>] ip6addrlbl_net_init+0x109/0x170
          [<000000006a9ca9d4>] ops_init+0xa8/0x3c0
          [<000000002da57bf2>] setup_net+0x2de/0x7e0
          [<000000004e52d573>] copy_net_ns+0x27d/0x530
          [<00000000b07ae2b4>] create_new_namespaces+0x382/0xa30
          [<000000003b76d36f>] unshare_nsproxy_namespaces+0xa1/0x1d0
          [<0000000030653721>] ksys_unshare+0x3a4/0x780
          [<0000000007e82e40>] __x64_sys_unshare+0x2d/0x40
          [<0000000031a10c08>] do_syscall_64+0x33/0x40
          [<0000000099df30e7>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      We should free all rules when we catch an error in ip6addrlbl_net_init().
      otherwise a memory leak will occur.
      
      Fixes: 2a8cc6c8 ("[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.")
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NWang Hai <wanghai38@huawei.com>
      Link: https://lore.kernel.org/r/20201124071728.8385-1-wanghai38@huawei.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      e255e11e
  16. 25 11月, 2020 1 次提交
    • A
      tcp: Set ECT0 bit in tos/tclass for synack when BPF needs ECN · 407c85c7
      Alexander Duyck 提交于
      When a BPF program is used to select between a type of TCP congestion
      control algorithm that uses either ECN or not there is a case where the
      synack for the frame was coming up without the ECT0 bit set. A bit of
      research found that this was due to the final socket being configured to
      dctcp while the listener socket was staying in cubic.
      
      To reproduce it all that is needed is to monitor TCP traffic while running
      the sample bpf program "samples/bpf/tcp_cong_kern.c". What is observed,
      assuming tcp_dctcp module is loaded or compiled in and the traffic matches
      the rules in the sample file, is that for all frames with the exception of
      the synack the ECT0 bit is set.
      
      To address that it is necessary to make one additional call to
      tcp_bpf_ca_needs_ecn using the request socket and then use the output of
      that to set the ECT0 bit for the tos/tclass of the packet.
      
      Fixes: 91b5b21c ("bpf: Add support for changing congestion control")
      Signed-off-by: NAlexander Duyck <alexanderduyck@fb.com>
      Link: https://lore.kernel.org/r/160593039663.2604.1374502006916871573.stgit@localhost.localdomainSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      407c85c7
  17. 24 11月, 2020 2 次提交
    • R
      tcp: fix race condition when creating child sockets from syncookies · 01770a16
      Ricardo Dias 提交于
      When the TCP stack is in SYN flood mode, the server child socket is
      created from the SYN cookie received in a TCP packet with the ACK flag
      set.
      
      The child socket is created when the server receives the first TCP
      packet with a valid SYN cookie from the client. Usually, this packet
      corresponds to the final step of the TCP 3-way handshake, the ACK
      packet. But is also possible to receive a valid SYN cookie from the
      first TCP data packet sent by the client, and thus create a child socket
      from that SYN cookie.
      
      Since a client socket is ready to send data as soon as it receives the
      SYN+ACK packet from the server, the client can send the ACK packet (sent
      by the TCP stack code), and the first data packet (sent by the userspace
      program) almost at the same time, and thus the server will equally
      receive the two TCP packets with valid SYN cookies almost at the same
      instant.
      
      When such event happens, the TCP stack code has a race condition that
      occurs between the momement a lookup is done to the established
      connections hashtable to check for the existence of a connection for the
      same client, and the moment that the child socket is added to the
      established connections hashtable. As a consequence, this race condition
      can lead to a situation where we add two child sockets to the
      established connections hashtable and deliver two sockets to the
      userspace program to the same client.
      
      This patch fixes the race condition by checking if an existing child
      socket exists for the same client when we are adding the second child
      socket to the established connections socket. If an existing child
      socket exists, we drop the packet and discard the second child socket
      to the same client.
      Signed-off-by: NRicardo Dias <rdias@singlestore.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20201120111133.GA67501@rdias-suse-pc.lanSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      01770a16
    • P
      lsm,selinux: pass flowi_common instead of flowi to the LSM hooks · 3df98d79
      Paul Moore 提交于
      As pointed out by Herbert in a recent related patch, the LSM hooks do
      not have the necessary address family information to use the flowi
      struct safely.  As none of the LSMs currently use any of the protocol
      specific flowi information, replace the flowi pointers with pointers
      to the address family independent flowi_common struct.
      Reported-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: NJames Morris <jamorris@linux.microsoft.com>
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      3df98d79
  18. 21 11月, 2020 1 次提交
    • A
      tcp: Allow full IP tos/IPv6 tclass to be reflected in L3 header · 861602b5
      Alexander Duyck 提交于
      An issue was recently found where DCTCP SYN/ACK packets did not have the
      ECT bit set in the L3 header. A bit of code review found that the recent
      change referenced below had gone though and added a mask that prevented the
      ECN bits from being populated in the L3 header.
      
      This patch addresses that by rolling back the mask so that it is only
      applied to the flags coming from the incoming TCP request instead of
      applying it to the socket tos/tclass field. Doing this the ECT bits were
      restored in the SYN/ACK packets in my testing.
      
      One thing that is not addressed by this patch set is the fact that
      tcp_reflect_tos appears to be incompatible with ECN based congestion
      avoidance algorithms. At a minimum the feature should likely be documented
      which it currently isn't.
      
      Fixes: ac8f1710 ("tcp: reflect tos value received in SYN to the socket")
      Signed-off-by: NAlexander Duyck <alexanderduyck@fb.com>
      Acked-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      861602b5
  19. 20 11月, 2020 3 次提交
  20. 19 11月, 2020 1 次提交
  21. 17 11月, 2020 1 次提交
    • G
      ipv6/netfilter: Discard first fragment not including all headers · 9d9e937b
      Georg Kohmann 提交于
      Packets are processed even though the first fragment don't include all
      headers through the upper layer header. This breaks TAHI IPv6 Core
      Conformance Test v6LC.1.3.6.
      
      Referring to RFC8200 SECTION 4.5: "If the first fragment does not include
      all headers through an Upper-Layer header, then that fragment should be
      discarded and an ICMP Parameter Problem, Code 3, message should be sent to
      the source of the fragment, with the Pointer field set to zero."
      
      The fragment needs to be validated the same way it is done in
      commit 2efdaaaf ("IPv6: reply ICMP error if the first fragment don't
      include all headers") for ipv6. Wrap the validation into a common function,
      ipv6_frag_thdr_truncated() to check for truncation in the upper layer
      header. This validation does not fullfill all aspects of RFC 8200,
      section 4.5, but is at the moment sufficient to pass mentioned TAHI test.
      
      In netfilter, utilize the fragment offset returned by find_prev_fhdr() to
      let ipv6_frag_thdr_truncated() start it's traverse from the fragment
      header.
      
      Return 0 to drop the fragment in the netfilter. This is the same behaviour
      as used on other protocol errors in this function, e.g. when
      nf_ct_frag6_queue() returns -EPROTO. The Fragment will later be picked up
      by ipv6_frag_rcv() in reassembly.c. ipv6_frag_rcv() will then send an
      appropriate ICMP Parameter Problem message back to the source.
      
      References commit 2efdaaaf ("IPv6: reply ICMP error if the first
      fragment don't include all headers")
      Signed-off-by: NGeorg Kohmann <geokohma@cisco.com>
      Acked-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Link: https://lore.kernel.org/r/20201111115025.28879-1-geokohma@cisco.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      9d9e937b
  22. 15 11月, 2020 2 次提交
  23. 14 11月, 2020 2 次提交
  24. 13 11月, 2020 1 次提交
  25. 11 11月, 2020 2 次提交
  26. 10 11月, 2020 3 次提交