1. 28 3月, 2020 6 次提交
  2. 14 3月, 2020 1 次提交
  3. 13 3月, 2020 3 次提交
    • E
      bpf: Add bpf_xdp_output() helper · d831ee84
      Eelco Chaudron 提交于
      Introduce new helper that reuses existing xdp perf_event output
      implementation, but can be called from raw_tracepoint programs
      that receive 'struct xdp_buff *' as a tracepoint argument.
      Signed-off-by: NEelco Chaudron <echaudro@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/bpf/158348514556.2239.11050972434793741444.stgit@xdp-tutorial
      d831ee84
    • P
      flow_offload: Add flow_match_ct to get rule ct match · ee1c45e8
      Paul Blakey 提交于
      Add relevant getter for ct info dissector.
      Signed-off-by: NPaul Blakey <paulb@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee1c45e8
    • J
      Revert "net: sched: make newly activated qdiscs visible" · 7c4046b1
      Julian Wiedmann 提交于
      This reverts commit 4cda7527
      from net-next.
      
      Brown bag time.
      
      Michal noticed that this change doesn't work at all when
      netif_set_real_num_tx_queues() gets called prior to an initial
      dev_activate(), as for instance igb does.
      
      Doing so dies with:
      
      [   40.579142] BUG: kernel NULL pointer dereference, address: 0000000000000400
      [   40.586922] #PF: supervisor read access in kernel mode
      [   40.592668] #PF: error_code(0x0000) - not-present page
      [   40.598405] PGD 0 P4D 0
      [   40.601234] Oops: 0000 [#1] PREEMPT SMP PTI
      [   40.605909] CPU: 18 PID: 1681 Comm: wickedd Tainted: G            E     5.6.0-rc3-ethnl.50-default #1
      [   40.616205] Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.R3.27.D685.1305151734 05/15/2013
      [   40.627377] RIP: 0010:qdisc_hash_add.part.22+0x2e/0x90
      [   40.633115] Code: 00 55 53 89 f5 48 89 fb e8 2f 9b fb ff 85 c0 74 44 48 8b 43 40 48 8b 08 69 43 38 47 86 c8 61 c1 e8 1c 48 83 e8 80 48 8d 14 c1 <48> 8b 04 c1 48 8d 4b 28 48 89 53 30 48 89 43 28 48 85 c0 48 89 0a
      [   40.654080] RSP: 0018:ffffb879864934d8 EFLAGS: 00010203
      [   40.659914] RAX: 0000000000000080 RBX: ffffffffb8328d80 RCX: 0000000000000000
      [   40.667882] RDX: 0000000000000400 RSI: 0000000000000000 RDI: ffffffffb831faa0
      [   40.675849] RBP: 0000000000000000 R08: ffffa0752c8b9088 R09: ffffa0752c8b9208
      [   40.683816] R10: 0000000000000006 R11: 0000000000000000 R12: ffffa0752d734000
      [   40.691783] R13: 0000000000000008 R14: 0000000000000000 R15: ffffa07113c18000
      [   40.699750] FS:  00007f94548e5880(0000) GS:ffffa0752e980000(0000) knlGS:0000000000000000
      [   40.708782] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   40.715189] CR2: 0000000000000400 CR3: 000000082b6ae006 CR4: 00000000001606e0
      [   40.723156] Call Trace:
      [   40.725888]  dev_qdisc_set_real_num_tx_queues+0x61/0x90
      [   40.731725]  netif_set_real_num_tx_queues+0x94/0x1d0
      [   40.737286]  __igb_open+0x19a/0x5d0 [igb]
      [   40.741767]  __dev_open+0xbb/0x150
      [   40.745567]  __dev_change_flags+0x157/0x1a0
      [   40.750240]  dev_change_flags+0x23/0x60
      
      [...]
      
      Fixes: 4cda7527 ("net: sched: make newly activated qdiscs visible")
      Reported-by: NMichal Kubecek <mkubecek@suse.cz>
      CC: Michal Kubecek <mkubecek@suse.cz>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jamal Hadi Salim <jhs@mojatatu.com>
      CC: Cong Wang <xiyou.wangcong@gmail.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c4046b1
  4. 12 3月, 2020 1 次提交
    • J
      net: sched: make newly activated qdiscs visible · 4cda7527
      Julian Wiedmann 提交于
      In their .attach callback, mq[prio] only add the qdiscs of the currently
      active TX queues to the device's qdisc hash list.
      If a user later increases the number of active TX queues, their qdiscs
      are not visible via eg. 'tc qdisc show'.
      
      Add a hook to netif_set_real_num_tx_queues() that walks all active
      TX queues and adds those which are missing to the hash list.
      
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jamal Hadi Salim <jhs@mojatatu.com>
      CC: Cong Wang <xiyou.wangcong@gmail.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4cda7527
  5. 11 3月, 2020 2 次提交
    • L
      pktgen: Allow on loopback device · 1e09e581
      Lukas Wunner 提交于
      When pktgen is used to measure the performance of dev_queue_xmit()
      packet handling in the core, it is preferable to not hand down
      packets to a low-level Ethernet driver as it would distort the
      measurements.
      
      Allow using pktgen on the loopback device, thus constraining
      measurements to core code.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e09e581
    • S
      net: memcg: late association of sock to memcg · d752a498
      Shakeel Butt 提交于
      If a TCP socket is allocated in IRQ context or cloned from unassociated
      (i.e. not associated to a memcg) in IRQ context then it will remain
      unassociated for its whole life. Almost half of the TCPs created on the
      system are created in IRQ context, so, memory used by such sockets will
      not be accounted by the memcg.
      
      This issue is more widespread in cgroup v1 where network memory
      accounting is opt-in but it can happen in cgroup v2 if the source socket
      for the cloning was created in root memcg.
      
      To fix the issue, just do the association of the sockets at the accept()
      time in the process context and then force charge the memory buffer
      already used and reserved by the socket.
      Signed-off-by: NShakeel Butt <shakeelb@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d752a498
  6. 10 3月, 2020 6 次提交
  7. 04 3月, 2020 4 次提交
  8. 29 2月, 2020 2 次提交
  9. 28 2月, 2020 1 次提交
    • M
      bpf: INET_DIAG support in bpf_sk_storage · 1ed4d924
      Martin KaFai Lau 提交于
      This patch adds INET_DIAG support to bpf_sk_storage.
      
      1. Although this series adds bpf_sk_storage diag capability to inet sk,
         bpf_sk_storage is in general applicable to all fullsock.  Hence, the
         bpf_sk_storage logic will operate on SK_DIAG_* nlattr.  The caller
         will pass in its specific nesting nlattr (e.g. INET_DIAG_*) as
         the argument.
      
      2. The request will be like:
      	INET_DIAG_REQ_SK_BPF_STORAGES (nla_nest) (defined in latter patch)
      		SK_DIAG_BPF_STORAGE_REQ_MAP_FD (nla_put_u32)
      		SK_DIAG_BPF_STORAGE_REQ_MAP_FD (nla_put_u32)
      		......
      
         Considering there could have multiple bpf_sk_storages in a sk,
         instead of reusing INET_DIAG_INFO ("ss -i"),  the user can select
         some specific bpf_sk_storage to dump by specifying an array of
         SK_DIAG_BPF_STORAGE_REQ_MAP_FD.
      
         If no SK_DIAG_BPF_STORAGE_REQ_MAP_FD is specified (i.e. an empty
         INET_DIAG_REQ_SK_BPF_STORAGES), it will dump all bpf_sk_storages
         of a sk.
      
      3. The reply will be like:
      	INET_DIAG_BPF_SK_STORAGES (nla_nest) (defined in latter patch)
      		SK_DIAG_BPF_STORAGE (nla_nest)
      			SK_DIAG_BPF_STORAGE_MAP_ID (nla_put_u32)
      			SK_DIAG_BPF_STORAGE_MAP_VALUE (nla_reserve_64bit)
      		SK_DIAG_BPF_STORAGE (nla_nest)
      			SK_DIAG_BPF_STORAGE_MAP_ID (nla_put_u32)
      			SK_DIAG_BPF_STORAGE_MAP_VALUE (nla_reserve_64bit)
      		......
      
      4. Unlike other INET_DIAG info of a sk which is pretty static, the size
         required to dump the bpf_sk_storage(s) of a sk is dynamic as the
         system adding more bpf_sk_storage_map.  It is hard to set a static
         min_dump_alloc size.
      
         Hence, this series learns it at the runtime and adjust the
         cb->min_dump_alloc as it iterates all sk(s) of a system.  The
         "unsigned int *res_diag_size" in bpf_sk_storage_diag_put()
         is for this purpose.
      
         The next patch will update the cb->min_dump_alloc as it
         iterates the sk(s).
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20200225230421.1975729-1-kafai@fb.com
      1ed4d924
  10. 27 2月, 2020 5 次提交
    • C
      net: fix sysfs permssions when device changes network namespace · ef6a4c88
      Christian Brauner 提交于
      Now that we moved all the helpers in place and make use netdev_change_owner()
      to fixup the permissions when moving network devices between network
      namespaces.
      Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ef6a4c88
    • C
      net-sysfs: add queue_change_owner() · d755407d
      Christian Brauner 提交于
      Add a function to change the owner of the queue entries for a network device
      when it is moved between network namespaces.
      
      Currently, when moving network devices between network namespaces the
      ownership of the corresponding queue sysfs entries are not changed. This leads
      to problems when tools try to operate on the corresponding sysfs files. Fix
      this.
      Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d755407d
    • C
      net-sysfs: add netdev_change_owner() · e6dee9f3
      Christian Brauner 提交于
      Add a function to change the owner of a network device when it is moved
      between network namespaces.
      
      Currently, when moving network devices between network namespaces the
      ownership of the corresponding sysfs entries is not changed. This leads
      to problems when tools try to operate on the corresponding sysfs files.
      This leads to a bug whereby a network device that is created in a
      network namespaces owned by a user namespace will have its corresponding
      sysfs entry owned by the root user of the corresponding user namespace.
      If such a network device has to be moved back to the host network
      namespace the permissions will still be set to the user namespaces. This
      means unprivileged users can e.g. trigger uevents for such incorrectly
      owned devices. They can also modify the settings of the device itself.
      Both of these things are unwanted.
      
      For example, workloads will create network devices in the host network
      namespace. Other tools will then proceed to move such devices between
      network namespaces owner by other user namespaces. While the ownership
      of the device itself is updated in
      net/core/net-sysfs.c:dev_change_net_namespace() the corresponding sysfs
      entry for the device is not:
      
      drwxr-xr-x 5 nobody nobody    0 Jan 25 18:08 .
      drwxr-xr-x 9 nobody nobody    0 Jan 25 18:08 ..
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 addr_assign_type
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 addr_len
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 address
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 broadcast
      -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_changes
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_down_count
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_up_count
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dev_id
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dev_port
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dormant
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 duplex
      -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 flags
      -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 gro_flush_timeout
      -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 ifalias
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 ifindex
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 iflink
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 link_mode
      -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 mtu
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 name_assign_type
      -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 netdev_group
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 operstate
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_port_id
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_port_name
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_switch_id
      drwxr-xr-x 2 nobody nobody    0 Jan 25 18:09 power
      -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 proto_down
      drwxr-xr-x 4 nobody nobody    0 Jan 25 18:09 queues
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 speed
      drwxr-xr-x 2 nobody nobody    0 Jan 25 18:09 statistics
      lrwxrwxrwx 1 nobody nobody    0 Jan 25 18:08 subsystem -> ../../../../class/net
      -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 tx_queue_len
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 type
      -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:08 uevent
      
      However, if a device is created directly in the network namespace then
      the device's sysfs permissions will be correctly updated:
      
      drwxr-xr-x 5 root   root      0 Jan 25 18:12 .
      drwxr-xr-x 9 nobody nobody    0 Jan 25 18:08 ..
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 addr_assign_type
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 addr_len
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 address
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 broadcast
      -rw-r--r-- 1 root   root   4096 Jan 25 18:12 carrier
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 carrier_changes
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 carrier_down_count
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 carrier_up_count
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 dev_id
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 dev_port
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 dormant
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 duplex
      -rw-r--r-- 1 root   root   4096 Jan 25 18:12 flags
      -rw-r--r-- 1 root   root   4096 Jan 25 18:12 gro_flush_timeout
      -rw-r--r-- 1 root   root   4096 Jan 25 18:12 ifalias
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 ifindex
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 iflink
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 link_mode
      -rw-r--r-- 1 root   root   4096 Jan 25 18:12 mtu
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 name_assign_type
      -rw-r--r-- 1 root   root   4096 Jan 25 18:12 netdev_group
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 operstate
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 phys_port_id
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 phys_port_name
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 phys_switch_id
      drwxr-xr-x 2 root   root      0 Jan 25 18:12 power
      -rw-r--r-- 1 root   root   4096 Jan 25 18:12 proto_down
      drwxr-xr-x 4 root   root      0 Jan 25 18:12 queues
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 speed
      drwxr-xr-x 2 root   root      0 Jan 25 18:12 statistics
      lrwxrwxrwx 1 nobody nobody    0 Jan 25 18:12 subsystem -> ../../../../class/net
      -rw-r--r-- 1 root   root   4096 Jan 25 18:12 tx_queue_len
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 type
      -rw-r--r-- 1 root   root   4096 Jan 25 18:12 uevent
      
      Now, when creating a network device in a network namespace owned by a
      user namespace and moving it to the host the permissions will be set to
      the id that the user namespace root user has been mapped to on the host
      leading to all sorts of permission issues:
      
      458752
      drwxr-xr-x 5 458752 458752      0 Jan 25 18:12 .
      drwxr-xr-x 9 root   root        0 Jan 25 18:08 ..
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 addr_assign_type
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 addr_len
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 address
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 broadcast
      -rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 carrier
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 carrier_changes
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 carrier_down_count
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 carrier_up_count
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 dev_id
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 dev_port
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 dormant
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 duplex
      -rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 flags
      -rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 gro_flush_timeout
      -rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 ifalias
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 ifindex
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 iflink
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 link_mode
      -rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 mtu
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 name_assign_type
      -rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 netdev_group
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 operstate
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 phys_port_id
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 phys_port_name
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 phys_switch_id
      drwxr-xr-x 2 458752 458752      0 Jan 25 18:12 power
      -rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 proto_down
      drwxr-xr-x 4 458752 458752      0 Jan 25 18:12 queues
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 speed
      drwxr-xr-x 2 458752 458752      0 Jan 25 18:12 statistics
      lrwxrwxrwx 1 root   root        0 Jan 25 18:12 subsystem -> ../../../../class/net
      -rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 tx_queue_len
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 type
      -rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 uevent
      Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6dee9f3
    • M
      net: core: devlink.c: Use built-in RCU list checking · 2eb51c75
      Madhuparna Bhowmik 提交于
      list_for_each_entry_rcu() has built-in RCU and lock checking.
      
      Pass cond argument to list_for_each_entry_rcu() to silence
      false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled.
      
      The devlink->lock is held when devlink_dpipe_table_find()
      is called in non RCU read side section. Therefore, pass struct devlink
      to devlink_dpipe_table_find() for lockdep checking.
      Signed-off-by: NMadhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2eb51c75
    • A
      net: Fix Tx hash bound checking · 6e11d157
      Amritha Nambiar 提交于
      Fixes the lower and upper bounds when there are multiple TCs and
      traffic is on the the same TC on the same device.
      
      The lower bound is represented by 'qoffset' and the upper limit for
      hash value is 'qcount + qoffset'. This gives a clean Rx to Tx queue
      mapping when there are multiple TCs, as the queue indices for upper TCs
      will be offset by 'qoffset'.
      
      v2: Fixed commit description based on comments.
      
      Fixes: 1b837d48 ("net: Revoke export for __skb_tx_hash, update it to just be static skb_tx_hash")
      Fixes: eadec877 ("net: Add support for subordinate traffic classes to netdev_pick_tx")
      Signed-off-by: NAmritha Nambiar <amritha.nambiar@intel.com>
      Reviewed-by: NAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Reviewed-by: NSridhar Samudrala <sridhar.samudrala@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e11d157
  11. 26 2月, 2020 4 次提交
  12. 25 2月, 2020 3 次提交
  13. 24 2月, 2020 1 次提交
  14. 22 2月, 2020 1 次提交
    • J
      net: Generate reuseport group ID on group creation · 035ff358
      Jakub Sitnicki 提交于
      Commit 736b4602 ("net: Add ID (if needed) to sock_reuseport and expose
      reuseport_lock") has introduced lazy generation of reuseport group IDs that
      survive group resize.
      
      By comparing the identifier we check if BPF reuseport program is not trying
      to select a socket from a BPF map that belongs to a different reuseport
      group than the one the packet is for.
      
      Because SOCKARRAY used to be the only BPF map type that can be used with
      reuseport BPF, it was possible to delay the generation of reuseport group
      ID until a socket from the group was inserted into BPF map for the first
      time.
      
      Now that SOCK{MAP,HASH} can be used with reuseport BPF we have two options,
      either generate the reuseport ID on map update, like SOCKARRAY does, or
      allocate an ID from the start when reuseport group gets created.
      
      This patch takes the latter approach to keep sockmap free of calls into
      reuseport code. This streamlines the reuseport_id access as its lifetime
      now matches the longevity of reuseport object.
      
      The cost of this simplification, however, is that we allocate reuseport IDs
      for all SO_REUSEPORT users. Even those that don't use SOCKARRAY in their
      setups. With the way identifiers are currently generated, we can have at
      most S32_MAX reuseport groups, which hopefully is sufficient. If we ever
      get close to the limit, we can switch an u64 counter like sk_cookie.
      
      Another change is that we now always call into SOCKARRAY logic to unlink
      the socket from the map when unhashing or closing the socket. Previously we
      did it only when at least one socket from the group was in a BPF map.
      
      It is worth noting that this doesn't conflict with sockmap tear-down in
      case a socket is in a SOCK{MAP,HASH} and belongs to a reuseport
      group. sockmap tear-down happens first:
      
        prot->unhash
        `- tcp_bpf_unhash
           |- tcp_bpf_remove
           |  `- while (sk_psock_link_pop(psock))
           |     `- sk_psock_unlink
           |        `- sock_map_delete_from_link
           |           `- __sock_map_delete
           |              `- sock_map_unref
           |                 `- sk_psock_put
           |                    `- sk_psock_drop
           |                       `- rcu_assign_sk_user_data(sk, NULL)
           `- inet_unhash
              `- reuseport_detach_sock
                 `- bpf_sk_reuseport_detach
                    `- WRITE_ONCE(sk->sk_user_data, NULL)
      Suggested-by: NMartin Lau <kafai@fb.com>
      Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20200218171023.844439-10-jakub@cloudflare.com
      035ff358