1. 04 3月, 2020 1 次提交
  2. 03 3月, 2020 8 次提交
  3. 01 3月, 2020 13 次提交
  4. 29 2月, 2020 6 次提交
  5. 28 2月, 2020 9 次提交
    • M
      bpf: inet_diag: Dump bpf_sk_storages in inet_diag_dump() · 085c20ca
      Martin KaFai Lau 提交于
      This patch will dump out the bpf_sk_storages of a sk
      if the request has the INET_DIAG_REQ_SK_BPF_STORAGES nlattr.
      
      An array of SK_DIAG_BPF_STORAGE_REQ_MAP_FD can be specified in
      INET_DIAG_REQ_SK_BPF_STORAGES to select which bpf_sk_storage to dump.
      If no map_fd is specified, all bpf_sk_storages of a sk will be dumped.
      
      bpf_sk_storages can be added to the system at runtime.  It is difficult
      to find a proper static value for cb->min_dump_alloc.
      
      This patch learns the nlattr size required to dump the bpf_sk_storages
      of a sk.  If it happens to be the very first nlmsg of a dump and it
      cannot fit the needed bpf_sk_storages,  it will try to expand the
      skb by "pskb_expand_head()".
      
      Instead of expanding it in inet_sk_diag_fill(), it is expanded at a
      sleepable context in __inet_diag_dump() so __GFP_DIRECT_RECLAIM can
      be used.  In __inet_diag_dump(), it will retry as long as the
      skb is empty and the cb->min_dump_alloc becomes larger than before.
      cb->min_dump_alloc is bounded by KMALLOC_MAX_SIZE.  The min_dump_alloc
      is also changed from 'u16' to 'u32' to accommodate a sk that may have
      a few large bpf_sk_storages.
      
      The updated cb->min_dump_alloc will also be used to allocate the skb in
      the next dump.  This logic already exists in netlink_dump().
      
      Here is the sample output of a locally modified 'ss' and it could be made
      more readable by using BTF later:
      [root@arch-fb-vm1 ~]# ss --bpf-map-id 14 --bpf-map-id 13 -t6an 'dst [::1]:8989'
      State Recv-Q Send-Q Local Address:Port  Peer Address:PortProcess
      ESTAB 0      0              [::1]:51072        [::1]:8989
      	 bpf_map_id:14 value:[ 3feb ]
      	 bpf_map_id:13 value:[ 3f ]
      ESTAB 0      0              [::1]:51070        [::1]:8989
      	 bpf_map_id:14 value:[ 3feb ]
      	 bpf_map_id:13 value:[ 3f ]
      
      [root@arch-fb-vm1 ~]# ~/devshare/github/iproute2/misc/ss --bpf-maps -t6an 'dst [::1]:8989'
      State         Recv-Q         Send-Q                   Local Address:Port                    Peer Address:Port         Process
      ESTAB         0              0                                [::1]:51072                          [::1]:8989
      	 bpf_map_id:14 value:[ 3feb ]
      	 bpf_map_id:13 value:[ 3f ]
      	 bpf_map_id:12 value:[ 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000... total:65407 ]
      ESTAB         0              0                                [::1]:51070                          [::1]:8989
      	 bpf_map_id:14 value:[ 3feb ]
      	 bpf_map_id:13 value:[ 3f ]
      	 bpf_map_id:12 value:[ 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000... total:65407 ]
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20200225230427.1976129-1-kafai@fb.com
      085c20ca
    • M
      bpf: INET_DIAG support in bpf_sk_storage · 1ed4d924
      Martin KaFai Lau 提交于
      This patch adds INET_DIAG support to bpf_sk_storage.
      
      1. Although this series adds bpf_sk_storage diag capability to inet sk,
         bpf_sk_storage is in general applicable to all fullsock.  Hence, the
         bpf_sk_storage logic will operate on SK_DIAG_* nlattr.  The caller
         will pass in its specific nesting nlattr (e.g. INET_DIAG_*) as
         the argument.
      
      2. The request will be like:
      	INET_DIAG_REQ_SK_BPF_STORAGES (nla_nest) (defined in latter patch)
      		SK_DIAG_BPF_STORAGE_REQ_MAP_FD (nla_put_u32)
      		SK_DIAG_BPF_STORAGE_REQ_MAP_FD (nla_put_u32)
      		......
      
         Considering there could have multiple bpf_sk_storages in a sk,
         instead of reusing INET_DIAG_INFO ("ss -i"),  the user can select
         some specific bpf_sk_storage to dump by specifying an array of
         SK_DIAG_BPF_STORAGE_REQ_MAP_FD.
      
         If no SK_DIAG_BPF_STORAGE_REQ_MAP_FD is specified (i.e. an empty
         INET_DIAG_REQ_SK_BPF_STORAGES), it will dump all bpf_sk_storages
         of a sk.
      
      3. The reply will be like:
      	INET_DIAG_BPF_SK_STORAGES (nla_nest) (defined in latter patch)
      		SK_DIAG_BPF_STORAGE (nla_nest)
      			SK_DIAG_BPF_STORAGE_MAP_ID (nla_put_u32)
      			SK_DIAG_BPF_STORAGE_MAP_VALUE (nla_reserve_64bit)
      		SK_DIAG_BPF_STORAGE (nla_nest)
      			SK_DIAG_BPF_STORAGE_MAP_ID (nla_put_u32)
      			SK_DIAG_BPF_STORAGE_MAP_VALUE (nla_reserve_64bit)
      		......
      
      4. Unlike other INET_DIAG info of a sk which is pretty static, the size
         required to dump the bpf_sk_storage(s) of a sk is dynamic as the
         system adding more bpf_sk_storage_map.  It is hard to set a static
         min_dump_alloc size.
      
         Hence, this series learns it at the runtime and adjust the
         cb->min_dump_alloc as it iterates all sk(s) of a system.  The
         "unsigned int *res_diag_size" in bpf_sk_storage_diag_put()
         is for this purpose.
      
         The next patch will update the cb->min_dump_alloc as it
         iterates the sk(s).
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20200225230421.1975729-1-kafai@fb.com
      1ed4d924
    • M
      inet_diag: Move the INET_DIAG_REQ_BYTECODE nlattr to cb->data · 0df6d328
      Martin KaFai Lau 提交于
      The INET_DIAG_REQ_BYTECODE nlattr is currently re-found every time when
      the "dump()" is re-started.
      
      In a latter patch, it will also need to parse the new
      INET_DIAG_REQ_SK_BPF_STORAGES nlattr to learn the map_fds. Thus, this
      patch takes this chance to store the parsed nlattr in cb->data
      during the "start" time of a dump.
      
      By doing this, the "bc" argument also becomes unnecessary
      and is removed.  Also, the two copies of the INET_DIAG_REQ_BYTECODE
      parsing-audit logic between compat/current version can be
      consolidated to one.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20200225230415.1975555-1-kafai@fb.com
      0df6d328
    • M
      inet_diag: Refactor inet_sk_diag_fill(), dump(), and dump_one() · 5682d393
      Martin KaFai Lau 提交于
      In a latter patch, there is a need to update "cb->min_dump_alloc"
      in inet_sk_diag_fill() as it learns the diffierent bpf_sk_storages
      stored in a sk while dumping all sk(s) (e.g. tcp_hashinfo).
      
      The inet_sk_diag_fill() currently does not take the "cb" as an argument.
      One of the reason is inet_sk_diag_fill() is used by both dump_one()
      and dump() (which belong to the "struct inet_diag_handler".  The dump_one()
      interface does not pass the "cb" along.
      
      This patch is to make dump_one() pass a "cb".  The "cb" is created in
      inet_diag_cmd_exact().  The "nlh" and "in_skb" are stored in "cb" as
      the dump() interface does.  The total number of args in
      inet_sk_diag_fill() is also cut from 10 to 7 and
      that helps many callers to pass fewer args.
      
      In particular,
      "struct user_namespace *user_ns", "u32 pid", and "u32 seq"
      can be replaced by accessing "cb->nlh" and "cb->skb".
      
      A similar argument reduction is also made to
      inet_twsk_diag_fill() and inet_req_diag_fill().
      
      inet_csk_diag_dump() and inet_csk_diag_fill() are also removed.
      They are mostly equivalent to inet_sk_diag_fill().  Their repeated
      usages are very limited.  Thus, inet_sk_diag_fill() is directly used
      in those occasions.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20200225230409.1975173-1-kafai@fb.com
      5682d393
    • E
      net/mlx5e: Eswitch, Use per vport tables for mirroring · 96e32687
      Eli Cohen 提交于
      When using port mirroring, we forward the traffic to another table and
      use that table to forward to the mirrored vport. Since the hardware
      loses the values of reg c, and in particular reg c0, we fail the match
      on the input vport which previously existed in reg c0. To overcome this
      situation, we use a set of per vport tables, positioned at the lowest
      priority, and forward traffic to those tables. Since these tables are
      per vport, we can avoid matching on reg c0.
      
      Fixes: c01cfd0f ("net/mlx5: E-Switch, Add match on vport metadata for rule in fast path")
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Reviewed-by: NMark Bloch <markb@mellanox.com>
      Reviewed-by: NPaul Blakey <paulb@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      96e32687
    • G
      bpf: Replace zero-length array with flexible-array member · d7f10df8
      Gustavo A. R. Silva 提交于
      The current codebase makes use of the zero-length array language
      extension to the C90 standard, but the preferred mechanism to declare
      variable-length types such as these ones is a flexible array member[1][2],
      introduced in C99:
      
      struct foo {
              int stuff;
              struct boo array[];
      };
      
      By making use of the mechanism above, we will get a compiler warning
      in case the flexible array does not occur last in the structure, which
      will help us prevent some kind of undefined behavior bugs from being
      inadvertently introduced[3] to the codebase from now on.
      
      Also, notice that, dynamic memory allocations won't be affected by
      this change:
      
      "Flexible array members have incomplete type, and so the sizeof operator
      may not be applied. As a quirk of the original implementation of
      zero-length arrays, sizeof evaluates to zero."[1]
      
      This issue was found with the help of Coccinelle.
      
      [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20200227001744.GA3317@embeddedor
      d7f10df8
    • G
      NFC: Replace zero-length array with flexible-array member · da60fbe7
      Gustavo A. R. Silva 提交于
      The current codebase makes use of the zero-length array language
      extension to the C90 standard, but the preferred mechanism to declare
      variable-length types such as these ones is a flexible array member[1][2],
      introduced in C99:
      
      struct foo {
              int stuff;
              struct boo array[];
      };
      
      By making use of the mechanism above, we will get a compiler warning
      in case the flexible array does not occur last in the structure, which
      will help us prevent some kind of undefined behavior bugs from being
      inadvertently introduced[3] to the codebase from now on.
      
      Also, notice that, dynamic memory allocations won't be affected by
      this change:
      
      "Flexible array members have incomplete type, and so the sizeof operator
      may not be applied. As a quirk of the original implementation of
      zero-length arrays, sizeof evaluates to zero."[1]
      
      This issue was found with the help of Coccinelle.
      
      [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      da60fbe7
    • R
      net: dsa: propagate resolved link config via mac_link_up() · 5b502a7b
      Russell King 提交于
      Propagate the resolved link configuration down via DSA's
      phylink_mac_link_up() operation to allow split PCS/MAC to work.
      Tested-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b502a7b
    • R
      net: phylink: propagate resolved link config via mac_link_up() · 91a208f2
      Russell King 提交于
      Propagate the resolved link parameters via the mac_link_up() call for
      MACs that do not automatically track their PCS state. We propagate the
      link parameters via function arguments so that inappropriate members
      of struct phylink_link_state can't be accessed, and creating a new
      structure just for this adds needless complexity to the API.
      Tested-by: NAndre Przywara <andre.przywara@arm.com>
      Tested-by: NAlexandre Belloni <alexandre.belloni@bootlin.com>
      Tested-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91a208f2
  6. 27 2月, 2020 3 次提交
    • C
      device: add device_change_owner() · b8f33e5d
      Christian Brauner 提交于
      Add a helper to change the owner of a device's sysfs entries. This
      needs to happen when the ownership of a device is changed, e.g. when
      moving network devices between network namespaces.
      This function will be used to correctly account for ownership changes,
      e.g. when moving network devices between network namespaces.
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8f33e5d
    • C
      sysfs: add sysfs_change_owner() · 2c4f9401
      Christian Brauner 提交于
      Add a helper to change the owner of sysfs objects.
      This function will be used to correctly account for kobject ownership
      changes, e.g. when moving network devices between network namespaces.
      
      This mirrors how a kobject is added through driver core which in its guts is
      done via kobject_add_internal() which in summary creates the main directory via
      create_dir(), populates that directory with the groups associated with the
      ktype of the kobject (if any) and populates the directory with the basic
      attributes associated with the ktype of the kobject (if any). These are the
      basic steps that are associated with adding a kobject in sysfs.
      Any additional properties are added by the specific subsystem itself (not by
      driver core) after it has registered the device. So for the example of network
      devices, a network device will e.g. register a queue subdirectory under the
      basic sysfs directory for the network device and than further subdirectories
      within that queues subdirectory.  But that is all specific to network devices
      and they call the corresponding sysfs functions to do that directly when they
      create those queue objects. So anything that a subsystem adds outside of what
      driver core does must also be changed by it (That's already true for removal of
      files it created outside of driver core.) and it's the same for ownership
      changes.
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c4f9401
    • C
      sysfs: add sysfs_group{s}_change_owner() · 303a4276
      Christian Brauner 提交于
      Add helpers to change the owner of sysfs groups.
      This function will be used to correctly account for kobject ownership
      changes, e.g. when moving network devices between network namespaces.
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      303a4276