1. 04 8月, 2020 14 次提交
  2. 02 8月, 2020 9 次提交
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · bd0b33b2
      David S. Miller 提交于
      Resolved kernel/bpf/btf.c using instructions from merge commit
      69138b34Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd0b33b2
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · ac3a0c84
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Encap offset calculation is incorrect in esp6, from Sabrina Dubroca.
      
       2) Better parameter validation in pfkey_dump(), from Mark Salyzyn.
      
       3) Fix several clang issues on powerpc in selftests, from Tanner Love.
      
       4) cmsghdr_from_user_compat_to_kern() uses the wrong length, from Al
          Viro.
      
       5) Out of bounds access in mlx5e driver, from Raed Salem.
      
       6) Fix transfer buffer memleak in lan78xx, from Johan Havold.
      
       7) RCU fixups in rhashtable, from Herbert Xu.
      
       8) Fix ipv6 nexthop refcnt leak, from Xiyu Yang.
      
       9) vxlan FDB dump must be done under RCU, from Ido Schimmel.
      
      10) Fix use after free in mlxsw, from Ido Schimmel.
      
      11) Fix map leak in HASH_OF_MAPS bpf code, from Andrii Nakryiko.
      
      12) Fix bug in mac80211 Tx ack status reporting, from Vasanthakumar
          Thiagarajan.
      
      13) Fix memory leaks in IPV6_ADDRFORM code, from Cong Wang.
      
      14) Fix bpf program reference count leaks in mlx5 during
          mlx5e_alloc_rq(), from Xin Xiong.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (86 commits)
        vxlan: fix memleak of fdb
        rds: Prevent kernel-infoleak in rds_notify_queue_get()
        net/sched: The error lable position is corrected in ct_init_module
        net/mlx5e: fix bpf_prog reference count leaks in mlx5e_alloc_rq
        net/mlx5e: E-Switch, Specify flow_source for rule with no in_port
        net/mlx5e: E-Switch, Add misc bit when misc fields changed for mirroring
        net/mlx5e: CT: Support restore ipv6 tunnel
        net: gemini: Fix missing clk_disable_unprepare() in error path of gemini_ethernet_port_probe()
        ionic: unlock queue mutex in error path
        atm: fix atm_dev refcnt leaks in atmtcp_remove_persistent
        net: ethernet: mtk_eth_soc: fix MTU warnings
        net: nixge: fix potential memory leak in nixge_probe()
        devlink: ignore -EOPNOTSUPP errors on dumpit
        rxrpc: Fix race between recvmsg and sendmsg on immediate call failure
        MAINTAINERS: Replace Thor Thayer as Altera Triple Speed Ethernet maintainer
        selftests/bpf: fix netdevsim trap_flow_action_cookie read
        ipv6: fix memory leaks on IPV6_ADDRFORM path
        net/bpfilter: Initialize pos in __bpfilter_process_sockopt
        igb: reinit_locked() should be called with rtnl_lock
        e1000e: continue to init PHY even when failed to disable ULP
        ...
      ac3a0c84
    • L
      Merge tag 'for-linus-2020-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux · 0ae3495b
      Linus Torvalds 提交于
      Pull thread fix from Christian Brauner:
       "A simple spelling fix for dequeue_synchronous_signal()"
      
      * tag 'for-linus-2020-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
        signal: fix typo in dequeue_synchronous_signal()
      0ae3495b
    • L
      Merge tag 'perf-tools-fixes-2020-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux · bf121a0b
      Linus Torvalds 提交于
      Pull perf tooling fixes from Arnaldo Carvalho de Melo:
      
       - Fix libtraceevent build with binutils 2.35
      
       - Fix memory leak in process_dynamic_array_len in libtraceevent
      
       - Fix 'perf test 68' zstd compression for s390
      
       - Fix record failure when mixed with ARM SPE event
      
      * tag 'perf-tools-fixes-2020-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
        libtraceevent: Fix build with binutils 2.35
        perf tools: Fix record failure when mixed with ARM SPE event
        perf tests: Fix test 68 zstd compression for s390
        tools lib traceevent: Fix memory leak in process_dynamic_array_len
      bf121a0b
    • F
      mptcp: fix syncookie build error on UP · 7126bd5c
      Florian Westphal 提交于
      kernel test robot says:
      net/mptcp/syncookies.c: In function 'mptcp_join_cookie_init':
      include/linux/kernel.h:47:38: warning: division by zero [-Wdiv-by-zero]
       #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))
      
      I forgot that spinock_t size is 0 on UP, so ARRAY_SIZE cannot be used.
      
      Fixes: 9466a1cc ("mptcp: enable JOIN requests even if cookies are in use")
      Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7126bd5c
    • T
      vxlan: fix memleak of fdb · fda2ec62
      Taehee Yoo 提交于
      When vxlan interface is deleted, all fdbs are deleted by vxlan_flush().
      vxlan_flush() flushes fdbs but it doesn't delete fdb, which contains
      all-zeros-mac because it is deleted by vxlan_uninit().
      But vxlan_uninit() deletes only the fdb, which contains both all-zeros-mac
      and default vni.
      So, the fdb, which contains both all-zeros-mac and non-default vni
      will not be deleted.
      
      Test commands:
          ip link add vxlan0 type vxlan dstport 4789 external
          ip link set vxlan0 up
          bridge fdb add to 00:00:00:00:00:00 dst 172.0.0.1 dev vxlan0 via lo \
      	    src_vni 10000 self permanent
          ip link del vxlan0
      
      kmemleak reports as follows:
      unreferenced object 0xffff9486b25ced88 (size 96):
        comm "bridge", pid 2151, jiffies 4294701712 (age 35506.901s)
        hex dump (first 32 bytes):
          02 00 00 00 ac 00 00 01 40 00 09 b1 86 94 ff ff  ........@.......
          46 02 00 00 00 00 00 00 a7 03 00 00 12 b5 6a 6b  F.............jk
        backtrace:
          [<00000000c10cf651>] vxlan_fdb_append.part.51+0x3c/0xf0 [vxlan]
          [<000000006b31a8d9>] vxlan_fdb_create+0x184/0x1a0 [vxlan]
          [<0000000049399045>] vxlan_fdb_update+0x12f/0x220 [vxlan]
          [<0000000090b1ef00>] vxlan_fdb_add+0x12a/0x1b0 [vxlan]
          [<0000000056633c2c>] rtnl_fdb_add+0x187/0x270
          [<00000000dd5dfb6b>] rtnetlink_rcv_msg+0x264/0x490
          [<00000000fc44dd54>] netlink_rcv_skb+0x4a/0x110
          [<00000000dff433e7>] netlink_unicast+0x18e/0x250
          [<00000000b87fb421>] netlink_sendmsg+0x2e9/0x400
          [<000000002ed55153>] ____sys_sendmsg+0x237/0x260
          [<00000000faa51c66>] ___sys_sendmsg+0x88/0xd0
          [<000000006c3982f1>] __sys_sendmsg+0x4e/0x80
          [<00000000a8f875d2>] do_syscall_64+0x56/0xe0
          [<000000003610eefa>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      unreferenced object 0xffff9486b1c40080 (size 128):
        comm "bridge", pid 2157, jiffies 4294701754 (age 35506.866s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 f8 dc 42 b2 86 94 ff ff  ..........B.....
          6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
        backtrace:
          [<00000000a2981b60>] vxlan_fdb_create+0x67/0x1a0 [vxlan]
          [<0000000049399045>] vxlan_fdb_update+0x12f/0x220 [vxlan]
          [<0000000090b1ef00>] vxlan_fdb_add+0x12a/0x1b0 [vxlan]
          [<0000000056633c2c>] rtnl_fdb_add+0x187/0x270
          [<00000000dd5dfb6b>] rtnetlink_rcv_msg+0x264/0x490
          [<00000000fc44dd54>] netlink_rcv_skb+0x4a/0x110
          [<00000000dff433e7>] netlink_unicast+0x18e/0x250
          [<00000000b87fb421>] netlink_sendmsg+0x2e9/0x400
          [<000000002ed55153>] ____sys_sendmsg+0x237/0x260
          [<00000000faa51c66>] ___sys_sendmsg+0x88/0xd0
          [<000000006c3982f1>] __sys_sendmsg+0x4e/0x80
          [<00000000a8f875d2>] do_syscall_64+0x56/0xe0
          [<000000003610eefa>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 3ad7a4b1 ("vxlan: support fdb and learning in COLLECT_METADATA mode")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fda2ec62
    • B
      fib: fix another fib_rules_ops indirect call wrapper problem · 8b66a6fd
      Brian Vazquez 提交于
      It turns out that on commit 41d707b7 ("fib: fix fib_rules_ops
      indirect calls wrappers") I forgot to include the case when
      CONFIG_IP_MULTIPLE_TABLES is not set.
      
      Fixes: 41d707b7 ("fib: fix fib_rules_ops indirect calls wrappers")
      Reported-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NBrian Vazquez <brianvv@google.com>
      Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b66a6fd
    • E
      tcp: fix build fong CONFIG_MPTCP=n · 0e8642cf
      Eric Dumazet 提交于
      Fixes these errors:
      
      net/ipv4/syncookies.c: In function 'tcp_get_cookie_sock':
      net/ipv4/syncookies.c:216:19: error: 'struct tcp_request_sock' has no
      member named 'drop_req'
        216 |   if (tcp_rsk(req)->drop_req) {
            |                   ^~
      net/ipv4/syncookies.c: In function 'cookie_tcp_reqsk_alloc':
      net/ipv4/syncookies.c:289:27: warning: unused variable 'treq'
      [-Wunused-variable]
        289 |  struct tcp_request_sock *treq;
            |                           ^~~~
      make[3]: *** [scripts/Makefile.build:280: net/ipv4/syncookies.o] Error 1
      make[3]: *** Waiting for unfinished jobs....
      
      Fixes: 9466a1cc ("mptcp: enable JOIN requests even if cookies are in use")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Florian Westphal <fw@strlen.de>
      Acked-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e8642cf
    • L
      Merge tag 'pinctrl-v5.8-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · d52daa86
      Linus Torvalds 提交于
      Pull pin control fix from Linus Walleij:
       "A single last minute pin control fix to the Qualcomm driver fixing
        missing dual edge PCH interrupts"
      
      * tag 'pinctrl-v5.8-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        pinctrl: qcom: Handle broken/missing PDC dual edge IRQs on sc7180
      d52daa86
  3. 01 8月, 2020 17 次提交
    • D
      Merge tag 'mac80211-next-for-davem-2020-07-31' of... · 6f3de75c
      David S. Miller 提交于
      Merge tag 'mac80211-next-for-davem-2020-07-31' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
      
      Johannes Berg says:
      
      ====================
      We have a number of changes
       * code cleanups and fixups as usual
       * AQL & internal TXQ improvements from Felix
       * some mesh 802.1X support bits
       * some injection improvements from Mathy of KRACK
         fame, so we'll see what this results in ;-)
       * some more initial S1G supports bits, this time
         (some of?) the userspace APIs
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6f3de75c
    • R
      rtnetlink: add support for protodown reason · 829eb208
      Roopa Prabhu 提交于
      netdev protodown is a mechanism that allows protocols to
      hold an interface down. It was initially introduced in
      the kernel to hold links down by a multihoming protocol.
      There was also an attempt to introduce protodown
      reason at the time but was rejected. protodown and protodown reason
      is supported by almost every switching and routing platform.
      It was ok for a while to live without a protodown reason.
      But, its become more critical now given more than
      one protocol may need to keep a link down on a system
      at the same time. eg: vrrp peer node, port security,
      multihoming protocol. Its common for Network operators and
      protocol developers to look for such a reason on a networking
      box (Its also known as errDisable by most networking operators)
      
      This patch adds support for link protodown reason
      attribute. There are two ways to maintain protodown
      reasons.
      (a) enumerate every possible reason code in kernel
          - A protocol developer has to make a request and
            have that appear in a certain kernel version
      (b) provide the bits in the kernel, and allow user-space
      (sysadmin or NOS distributions) to manage the bit-to-reasonname
      map.
      	- This makes extending reason codes easier (kind of like
            the iproute2 table to vrf-name map /etc/iproute2/rt_tables.d/)
      
      This patch takes approach (b).
      
      a few things about the patch:
      - It treats the protodown reason bits as counter to indicate
      active protodown users
      - Since protodown attribute is already an exposed UAPI,
      the reason is not enforced on a protodown set. Its a no-op
      if not used.
      the patch follows the below algorithm:
        - presence of reason bits set indicates protodown
          is in use
        - user can set protodown and protodown reason in a
          single or multiple setlink operations
        - setlink operation to clear protodown, will return -EBUSY
          if there are active protodown reason bits
        - reason is not included in link dumps if not used
      
      example with patched iproute2:
      $cat /etc/iproute2/protodown_reasons.d/r.conf
      0 mlag
      1 evpn
      2 vrrp
      3 psecurity
      
      $ip link set dev vxlan0 protodown on protodown_reason vrrp on
      $ip link set dev vxlan0 protodown_reason mlag on
      $ip link show
      14: vxlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
      DEFAULT group default qlen 1000
          link/ether f6:06:be:17:91:e7 brd ff:ff:ff:ff:ff:ff protodown on <mlag,vrrp>
      
      $ip link set dev vxlan0 protodown_reason mlag off
      $ip link set dev vxlan0 protodown off protodown_reason vrrp off
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      829eb208
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 69138b34
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2020-07-31
      
      The following pull-request contains BPF updates for your *net* tree.
      
      We've added 5 non-merge commits during the last 21 day(s) which contain
      a total of 5 files changed, 126 insertions(+), 18 deletions(-).
      
      The main changes are:
      
      1) Fix a map element leak in HASH_OF_MAPS map type, from Andrii Nakryiko.
      
      2) Fix a NULL pointer dereference in __btf_resolve_helper_id() when no
         btf_vmlinux is available, from Peilin Ye.
      
      3) Init pos variable in __bpfilter_process_sockopt(), from Christoph Hellwig.
      
      4) Fix a cgroup sockopt verifier test by specifying expected attach type,
         from Jean-Philippe Brucker.
      
      Note that when net gets merged into net-next later on, there is a small
      merge conflict in kernel/bpf/btf.c between commit 5b801dfb ("bpf: Fix
      NULL pointer dereference in __btf_resolve_helper_id()") from the bpf tree
      and commit 138b9a05 ("bpf: Remove btf_id helpers resolving") from the
      net-next tree.
      
      Resolve as follows: remove the old hunk with the __btf_resolve_helper_id()
      function. Change the btf_resolve_helper_id() so it actually tests for a
      NULL btf_vmlinux and bails out:
      
      int btf_resolve_helper_id(struct bpf_verifier_log *log,
                                const struct bpf_func_proto *fn, int arg)
      {
              int id;
      
              if (fn->arg_type[arg] != ARG_PTR_TO_BTF_ID || !btf_vmlinux)
                      return -EINVAL;
              id = fn->btf_id[arg];
              if (!id || id > btf_vmlinux->nr_types)
                      return -EINVAL;
              return id;
      }
      
      Let me know if you run into any others issues (CC'ing Jiri Olsa so he's in
      the loop with regards to merge conflict resolution).
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      69138b34
    • J
      tun: add missing rcu annotation in tun_set_ebpf() · 8f3f330d
      Jason Wang 提交于
      We expecte prog_p to be protected by rcu, so adding the rcu annotation
      to fix the following sparse warning:
      
      drivers/net/tun.c:3003:36: warning: incorrect type in argument 2 (different address spaces)
      drivers/net/tun.c:3003:36:    expected struct tun_prog [noderef] __rcu **prog_p
      drivers/net/tun.c:3003:36:    got struct tun_prog **prog_p
      drivers/net/tun.c:3292:42: warning: incorrect type in argument 2 (different address spaces)
      drivers/net/tun.c:3292:42:    expected struct tun_prog **prog_p
      drivers/net/tun.c:3292:42:    got struct tun_prog [noderef] __rcu **
      drivers/net/tun.c:3296:42: warning: incorrect type in argument 2 (different address spaces)
      drivers/net/tun.c:3296:42:    expected struct tun_prog **prog_p
      drivers/net/tun.c:3296:42:    got struct tun_prog [noderef] __rcu **
      Reported-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f3f330d
    • D
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · 8d46215a
      David S. Miller 提交于
      Steffen Klassert says:
      
      ====================
      pull request (net): ipsec 2020-07-31
      
      1) Fix policy matching with mark and mask on userspace interfaces.
         From Xin Long.
      
      2) Several fixes for the new ESP in TCP encapsulation.
         From Sabrina Dubroca.
      
      3) Fix crash when the hold queue is used. The assumption that
         xdst->path and dst->child are not a NULL pointer only if dst->xfrm
         is not a NULL pointer is true with the exception of using the
         hold queue. Fix this by checking for hold queue usage before
         dereferencing xdst->path or dst->child.
      
      4) Validate pfkey_dump parameter before sending them.
         From Mark Salyzyn.
      
      5) Fix the location of the transport header with ESP in UDPv6
         encapsulation. From Sabrina Dubroca.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d46215a
    • D
      Merge tag 'mlx5-fixes-2020-07-30' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · e535d87d
      David S. Miller 提交于
      Saeed Mahameed says:
      
      ====================
      Mellanox, mlx5 fixes 2020-07-30
      
      This small patchset introduces some fixes to mlx5 driver.
      
      Please pull and let me know if there is any problem.
      
      For -stable v4.18:
       ('net/mlx5e: fix bpf_prog reference count leaks in mlx5e_alloc_rq')
      
      For -stable v5.7:
       ('net/mlx5e: E-Switch, Add misc bit when misc fields changed for mirroring')
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e535d87d
    • Y
      tcp: add earliest departure time to SCM_TIMESTAMPING_OPT_STATS · 48040793
      Yousuk Seung 提交于
      This change adds TCP_NLA_EDT to SCM_TIMESTAMPING_OPT_STATS that reports
      the earliest departure time(EDT) of the timestamped skb. By tracking EDT
      values of the skb from different timestamps, we can observe when and how
      much the value changed. This allows to measure the precise delay
      injected on the sender host e.g. by a bpf-base throttler.
      Signed-off-by: NYousuk Seung <ysseung@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48040793
    • D
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · c6886957
      David S. Miller 提交于
      Tony Nguyen says:
      
      ====================
      1GbE Intel Wired LAN Driver Updates 2020-07-30
      
      This series contains updates to e100, e1000, e1000e, igb, igbvf, ixgbe,
      ixgbevf, iavf, and driver documentation.
      
      Vaibhav Gupta converts legacy .suspend() and .resume() to generic PM
      callbacks for e100, igbvf, ixgbe, ixgbevf, and iavf.
      
      Suraj Upadhyay replaces 1 byte memsets with assignments for e1000,
      e1000e, igb, and ixgbe.
      
      Alexander Klimov replaces http links with https.
      
      Miaohe Lin replaces uses of memset to clear MAC addresses with
      eth_zero_addr().
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c6886957
    • D
      Merge branch 'mptcp-syncookies' · d9790bc2
      David S. Miller 提交于
      Florian Westphal says:
      
      ====================
      mptcp: add syncookie support
      
      Changes in v2:
      - first patch renames req->ts_cookie to req->syncookie instead of
        removing ts_cookie member.
      - patch to add 'want_cookie' arg to init_req() functions has been dropped.
        All users of that arg were changed to check 'req->syncookie' instead.
      
      v1 cover letter:
      
      When syn-cookies are used the SYN?ACK never contains a MPTCP option,
      because the code path that creates a request socket based on a valid
      cookie ACK lacks the needed changes to construct MPTCP request sockets.
      
      After this series, if SYN carries MP_CAPABLE option, the option is not
      cleared anymore and request socket will be reconstructed using the
      MP_CAPABLE option data that is re-sent with the ACK.
      
      This means that no additional state gets encoded into the syn cookie or
      the TCP timestamp.
      
      There are two caveats for SYN-Cookies with MPTCP:
      
      1. When syn-cookies are used, the server-generated key is not stored.
      The drawback is that the next connection request that comes in before
      the cookie-ACK has a small chance that it will generate the same local_key.
      
      If this happens, the cookie ACK that comes in second will (re)compute the
      token hash and then detects that this is already in use.
      Unlike normal case, where the server will pick a new key value and then
      re-tries, we can't do that because we already committed to the key value
      (it was sent to peer already).
      
      Im this case, MPTCP cannot be used and late TCP fallback happens.
      
      2). SYN packets with a MP_JOIN requests cannot be handled without storing
          state. This is because the SYN contains a nonce value that is needed to
          verify the HMAC of the MP_JOIN ACK that completes the three-way
          handshake.  Also, a local nonce is generated and used in the cookie
          SYN/ACK.
      
      There are only 2 ways to solve this:
       a) Do not support JOINs when cookies are in effect.
       b) Store the nonces somewhere.
      
      The approach chosen here is b).
      Patch 8 adds a fixed-size (1024 entries) state table to store the
      information required to validate the MP_JOIN ACK and re-build the
      request socket.
      
      State gets stored when syn-cookies are active and the token in the JOIN
      request referred to an established MPTCP connection that can also accept
      a new subflow.
      
      State is restored if the ACK cookie is valid, an MP_JOIN option is present
      and the state slot contains valid data from a previous SYN.
      
      After the request socket has been re-build, normal HMAC check is done just
      as without syn cookies.
      
      Largely identical to last RFC, except patch #8 which follows Paolos
      suggestion to use a private table storage area rather than keeping
      request sockets around.  This also means I dropped the patch to remove
      const qualifier from sk_listener pointers.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d9790bc2
    • F
      selftests: mptcp: add test cases for mptcp join tests with syn cookies · 00587187
      Florian Westphal 提交于
      Also add test cases with MP_JOIN when tcp_syncookies sysctl is 2 (i.e.,
      syncookies are always-on).
      
      While at it, also print the test number and add the test number
      to the pcap files that can be generated optionally.
      
      This makes it easier to match the pcap to the test case.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      00587187
    • F
      selftests: mptcp: make 2nd net namespace use tcp syn cookies unconditionally · fed61c4b
      Florian Westphal 提交于
      check we can establish connections also when syn cookies are in use.
      
      Check that
      MPTcpExtMPCapableSYNRX and MPTcpExtMPCapableACKRX increase for each
      MPTCP test.
      
      Check TcpExtSyncookiesSent and TcpExtSyncookiesRecv increase in netns2.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fed61c4b
    • F
      mptcp: enable JOIN requests even if cookies are in use · 9466a1cc
      Florian Westphal 提交于
      JOIN requests do not work in syncookie mode -- for HMAC validation, the
      peers nonce and the mptcp token (to obtain the desired connection socket
      the join is for) are required, but this information is only present in the
      initial syn.
      
      So either we need to drop all JOIN requests once a listening socket enters
      syncookie mode, or we need to store enough state to reconstruct the request
      socket later.
      
      This adds a state table (1024 entries) to store the data present in the
      MP_JOIN syn request and the random nonce used for the cookie syn/ack.
      
      When a MP_JOIN ACK passed cookie validation, the table is consulted
      to rebuild the request socket from it.
      
      An alternate approach would be to "cancel" syn-cookie mode and force
      MP_JOIN to always use a syn queue entry.
      
      However, doing so brings the backlog over the configured queue limit.
      
      v2: use req->syncookie, not (removed) want_cookie arg
      Suggested-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9466a1cc
    • F
      tcp: syncookies: create mptcp request socket for ACK cookies with MPTCP option · 6fc8c827
      Florian Westphal 提交于
      If SYN packet contains MP_CAPABLE option, keep it enabled.
      Syncokie validation and cookie-based socket creation is changed to
      instantiate an mptcp request sockets if the ACK contains an MPTCP
      connection request.
      
      Rather than extend both cookie_v4/6_check, add a common helper to create
      the (mp)tcp request socket.
      Suggested-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6fc8c827
    • F
      mptcp: subflow: add mptcp_subflow_init_cookie_req helper · c83a47e5
      Florian Westphal 提交于
      Will be used to initialize the mptcp request socket when a MP_CAPABLE
      request was handled in syncookie mode, i.e. when a TCP ACK containing a
      MP_CAPABLE option is a valid syncookie value.
      
      Normally (non-cookie case), MPTCP will generate a unique 32 bit connection
      ID and stores it in the MPTCP token storage to be able to retrieve the
      mptcp socket for subflow joining.
      
      In syncookie case, we do not want to store any state, so just generate the
      unique ID and use it in the reply.
      
      This means there is a small window where another connection could generate
      the same token.
      
      When Cookie ACK comes back, we check that the token has not been registered
      in the mean time.  If it was, the connection needs to fall back to TCP.
      
      Changes in v2:
       - use req->syncookie instead of passing 'want_cookie' arg to ->init_req()
         (Eric Dumazet)
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c83a47e5
    • F
      mptcp: rename and export mptcp_subflow_request_sock_ops · 08b8d080
      Florian Westphal 提交于
      syncookie code path needs to create an mptcp request sock.
      
      Prepare for this and add mptcp prefix plus needed export of ops struct.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      08b8d080
    • F
      mptcp: subflow: split subflow_init_req · 78d8b7bc
      Florian Westphal 提交于
      When syncookie support is added, we will need to add a variant of
      subflow_init_req() helper.  It will do almost same thing except
      that it will not compute/add a token to the mptcp token tree.
      
      To avoid excess copy&paste, this commit splits away part of the
      code into a new helper, __subflow_init_req, that can then be re-used
      from the 'no insert' function added in a followup change.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      78d8b7bc
    • F
      mptcp: token: move retry to caller · 535fb815
      Florian Westphal 提交于
      Once syncookie support is added, no state will be stored anymore when the
      syn/ack is generated in syncookie mode.
      
      When the ACK comes back, the generated key will be taken from the TCP ACK,
      the token is re-generated and inserted into the token tree.
      
      This means we can't retry with a new key when the token is already taken
      in the syncookie case.
      
      Therefore, move the retry logic to the caller to prepare for syncookie
      support in mptcp.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      535fb815