1. 01 4月, 2020 2 次提交
  2. 31 3月, 2020 38 次提交
    • D
      5a470b1a
    • G
      netdevsim: dev: Fix memory leak in nsim_dev_take_snapshot_write · 3902baf9
      Gustavo A. R. Silva 提交于
      In case memory resources for dummy_data were allocated, release them
      before return.
      
      Addresses-Coverity-ID: 1491997 ("Resource leak")
      Fixes: 7ef19d3b ("devlink: report error once U32_MAX snapshot ids have been used")
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3902baf9
    • D
      Merge branch 'stmmac-Add-additional-EHL-PCI-info-and-PCI-ID' · 1a795da7
      David S. Miller 提交于
      Voon Weifeng says:
      
      ====================
      stmmac: Add additional EHL PCI info and PCI ID
      
      Thanks Jose Miguel Abreu for the feedback. Summary of v2 patches:
      
      1/3: As suggested to keep the stmmac_pci.c file simple. So created a new
           file dwmac-intel.c and moved all the Intel specific PCI device out
           of stmmac_pci.c.
      
      2/3: Added Intel(R) Programmable Services Engine (Intel(R) PSE) MAC PCI ID
           and PCI info
      
      3/3: Added EHL 2.5Gbps PCI ID and info
      
      Changes from v1:
      -Added a patch to move all Intel specific PCI device from stmmac_pci.c to
       a new file named dwmac-intel.c.
      -Combine v1 patch 1/3 and 2/3 into single patch.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a795da7
    • V
      net: stmmac: add EHL 2.5Gbps PCI info and PCI ID · d63439f5
      Voon Weifeng 提交于
      Add EHL SGMII 2.5Gbps PCI info and PCI ID
      Signed-off-by: NVoon Weifeng <weifeng.voon@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d63439f5
    • V
      net: stmmac: add EHL PSE0 & PSE1 1Gbps PCI info and PCI ID · 67c08ac4
      Voon Weifeng 提交于
      Add EHL PSE0/1 RGMII & SGMII 1Gbps PCI info and PCI ID
      Signed-off-by: NVoon Weifeng <weifeng.voon@intel.com>
      Signed-off-by: NOng Boon Leong <boon.leong.ong@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67c08ac4
    • V
      net: stmmac: create dwmac-intel.c to contain all Intel platform · 58da0cfa
      Voon Weifeng 提交于
      As stmmac_pci.c file is getting bigger and more complex, it is reasonable
      to separate all the Intel specific dwmac pci device to a different file.
      This move includes Intel Quark, TGL and EHL. A new kernel config
      CONFIG_DWMAC_INTEL is introduced and depends on X86. For this initial
      patch, all the necessary function such as probe() and exit() are identical
      besides the function name.
      Signed-off-by: NVoon Weifeng <weifeng.voon@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58da0cfa
    • D
      Merge branch 'net-dsa-b53-and-bcm_sf2-updates-for-7278' · 60d79ab3
      David S. Miller 提交于
      Florian Fainelli says:
      
      ====================
      net: dsa: b53 & bcm_sf2 updates for 7278
      
      This patch series contains some updates to the b53 and bcm_sf2 drivers
      specifically for the 7278 Ethernet switch.
      
      The first patch is technically a bug fix so it should ideally be
      backported to -stable, provided that Dan also agress with my resolution
      on this.
      
      Patches #2 through #4 are minor changes to the core b53 driver to
      restore VLAN configuration upon system resumption as well as deny
      specific bridge/VLAN operations on port 7 with the 7278 which is special
      and does not support VLANs.
      
      Patches #5 through #9 add support for matching VLAN TCI keys/masks to
      the CFP code.
      
      Changes in v2:
      
      - fixed some code comments and arrange some code for easier reading
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60d79ab3
    • F
      net: dsa: bcm_sf2: Support specifying VLAN tag egress rule · 8b3abe30
      Florian Fainelli 提交于
      The port to which the ASP is connected on 7278 is not capable of
      processing VLAN tags as part of the Ethernet frame, so allow an user to
      configure the egress VLAN policy they want to see applied by purposing
      the h_ext.data[1] field. Bit 0 is used to indicate that 0=tagged,
      1=untagged.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b3abe30
    • F
      net: dsa: bcm_sf2: Add support for matching VLAN TCI · 7555020c
      Florian Fainelli 提交于
      Update relevant code paths to support the programming and matching of
      VLAN TCI, this is the only member of the ethtool_flow_ext that we can
      match, the switch does not permit matching the VLAN Ethernet Type field.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7555020c
    • F
      net: dsa: bcm_sf2: Move writing of CFP_DATA(5) into slicing functions · c2d639d1
      Florian Fainelli 提交于
      In preparation for matching VLANs, move the writing of CFP_DATA(5) into
      the IPv4 and IPv6 slicing logic since they are part of the per-flow
      configuration.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c2d639d1
    • F
      net: dsa: bcm_sf2: Check earlier for FLOW_EXT and FLOW_MAC_EXT · 5ae8c0d5
      Florian Fainelli 提交于
      We do not currently support matching on FLOW_EXT or FLOW_MAC_EXT, but we
      were not checking for those bits being set in the flow specification.
      
      The check for FLOW_EXT and FLOW_MAC_EXT are separated out because a
      subsequent commit will add support for matching VLAN TCI which are
      covered by FLOW_EXT.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5ae8c0d5
    • F
      net: dsa: bcm_sf2: Disable learning for ASP port · 8b6b208b
      Florian Fainelli 提交于
      We don't want to enable learning for the ASP port since it only receives
      directed traffic, this allows us to bypass ARL-driven forwarding rules
      which could conflict with Broadcom tags and/or CFP forwarding.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b6b208b
    • F
      net: dsa: b53: Deny enslaving port 7 for 7278 into a bridge · 31bfc2d4
      Florian Fainelli 提交于
      On 7278, port 7 connects to the ASP which should only receive frames
      through the use of CFP rules, it is not desirable to have it be part of
      a bridge at all since that would make it pick up unwanted traffic that
      it may not even be able to filter or sustain.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31bfc2d4
    • F
      net: dsa: b53: Prevent tagged VLAN on port 7 for 7278 · 88631864
      Florian Fainelli 提交于
      On 7278, port 7 of the switch connects to the ASP UniMAC which is not
      capable of processing VLAN tagged frames. We can still allow the port to
      be part of a VLAN entry, and we may want it to be untagged on egress on
      that VLAN because of that limitation.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      88631864
    • F
      net: dsa: b53: Restore VLAN entries upon (re)configuration · d7a0b1f7
      Florian Fainelli 提交于
      The first time b53_configure_vlan() is called we have not configured any
      VLAN entries yet, since that happens later when interfaces get brought
      up. When b53_configure_vlan() is called again from suspend/resume we
      need to restore all VLAN entries though.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7a0b1f7
    • F
      net: dsa: bcm_sf2: Fix overflow checks · d0802dc4
      Florian Fainelli 提交于
      Commit f949a12f ("net: dsa: bcm_sf2: fix buffer overflow doing
      set_rxnfc") tried to fix the some user controlled buffer overflows in
      bcm_sf2_cfp_rule_set() and bcm_sf2_cfp_rule_del() but the fix was using
      CFP_NUM_RULES, which while it is correct not to overflow the bitmaps, is
      not representative of what the device actually supports. Correct that by
      using bcm_sf2_cfp_rule_size() instead.
      
      The latter subtracts the number of rules by 1, so change the checks from
      greater than or equal to greater than accordingly.
      
      Fixes: f949a12f ("net: dsa: bcm_sf2: fix buffer overflow doing set_rxnfc")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d0802dc4
    • D
    • H
      hv_netvsc: Remove unnecessary round_up for recv_completion_cnt · f87238d3
      Haiyang Zhang 提交于
      The vzalloc_node(), already rounds the total size to whole pages, and
      sizeof(u64) is smaller than sizeof(struct recv_comp_data). So
      round_up of recv_completion_cnt is not necessary, and may cause extra
      memory allocation.
      
      To save memory, remove this unnecessary round_up for recv_completion_cnt.
      Signed-off-by: NHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f87238d3
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · d9679cd9
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS updates for net-next
      
      The following patchset contains Netfilter/IPVS updates for net-next:
      
      1) Add support to specify a stateful expression in set definitions,
         this allows users to specify e.g. counters per set elements.
      
      2) Flowtable software counter support.
      
      3) Flowtable hardware offload counter support, from wenxu.
      
      3) Parallelize flowtable hardware offload requests, from Paul Blakey.
         This includes a patch to add one work entry per offload command.
      
      4) Several patches to rework nf_queue refcount handling, from Florian
         Westphal.
      
      4) A few fixes for the flowtable tunnel offload: Fix crash if tunneling
         information is missing and set up indirect flow block as TC_SETUP_FT,
         patch from wenxu.
      
      5) Stricter netlink attribute sanity check on filters, from Romain Bellan
         and Florent Fourcot.
      
      5) Annotations to make sparse happy, from Jules Irenge.
      
      6) Improve icmp errors in debugging information, from Haishuang Yan.
      
      7) Fix warning in IPVS icmp error debugging, from Haishuang Yan.
      
      8) Fix endianess issue in tcp extension header, from Sergey Marinkevich.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d9679cd9
    • D
      Merge branch 'Add-packet-trap-policers-support' · 6fe9a949
      David S. Miller 提交于
      Ido Schimmel says:
      
      ====================
      Add packet trap policers support
      
      Background
      ==========
      
      Devices capable of offloading the kernel's datapath and perform
      functions such as bridging and routing must also be able to send (trap)
      specific packets to the kernel (i.e., the CPU) for processing.
      
      For example, a device acting as a multicast-aware bridge must be able to
      trap IGMP membership reports to the kernel for processing by the bridge
      module.
      
      Motivation
      ==========
      
      In most cases, the underlying device is capable of handling packet rates
      that are several orders of magnitude higher compared to those that can
      be handled by the CPU.
      
      Therefore, in order to prevent the underlying device from overwhelming
      the CPU, devices usually include packet trap policers that are able to
      police the trapped packets to rates that can be handled by the CPU.
      
      Proposed solution
      =================
      
      This patch set allows capable device drivers to register their supported
      packet trap policers with devlink. User space can then tune the
      parameters of these policers (currently, rate and burst size) and read
      from the device the number of packets that were dropped by the policer,
      if supported.
      
      These packet trap policers can then be bound to existing packet trap
      groups, which are used to aggregate logically related packet traps. As a
      result, trapped packets are policed to rates that can be handled the
      host CPU.
      
      Example usage
      =============
      
      Instantiate netdevsim:
      
      Dump available packet trap policers:
      netdevsim/netdevsim10:
        policer 1 rate 1000 burst 128
        policer 2 rate 2000 burst 256
        policer 3 rate 3000 burst 512
      
      Change the parameters of a packet trap policer:
      
      Bind a packet trap policer to a packet trap group:
      
      Dump parameters and statistics of a packet trap policer:
      netdevsim/netdevsim10:
        policer 3 rate 100 burst 16
          stats:
              rx:
                dropped 92
      
      Unbind a packet trap policer from a packet trap group:
      
      Patch set overview
      ==================
      
      Patch #1 adds the core infrastructure in devlink which allows capable
      device drivers to register their supported packet trap policers with
      devlink.
      
      Patch #2 extends the existing devlink-trap documentation.
      
      Patch #3 extends netdevsim to register a few dummy packet trap policers
      with devlink. Used later on to selftests the core infrastructure.
      
      Patches #4-#5 adds infrastructure in devlink to allow binding of packet
      trap policers to packet trap groups.
      
      Patch #6 extends netdevsim to allow such binding.
      
      Patch #7 adds a selftest over netdevsim that verifies the core
      devlink-trap policers functionality.
      
      Patches #8-#14 gradually add devlink-trap policers support in mlxsw.
      
      Patch #15 adds a selftest over mlxsw. All registered packet trap
      policers are verified to handle the configured rate and burst size.
      
      Future plans
      ============
      
      * Allow changing default association between packet traps and packet
        trap groups
      * Add more packet traps. For example, for control packets (e.g., IGMP)
      
      v3:
      * Rebase
      
      v2 (address comments from Jiri and Jakub):
      * Patch #1: Add 'strict_start_type' in devlink policy
      * Patch #1: Have device drivers provide max/min rate/burst size for each
        policer. Use them to check validity of user provided parameters
      * Patch #3: Remove check about burst size being a power of 2 and instead
        add a debugfs knob to fail the operation
      * Patch #3: Provide max/min rate/burst size when registering policers
        and remove the validity checks from nsim_dev_devlink_trap_policer_set()
      * Patch #5: Check for presence of 'DEVLINK_ATTR_TRAP_POLICER_ID' in
        devlink_trap_group_set() and bail if not present
      * Patch #5: Add extack error message in case trap group was partially
        modified
      * Patch #7: Add test case with new 'fail_trap_policer_set' knob
      * Patch #7: Add test case for partially modified trap group
      * Patch #10: Provide max/min rate/burst size when registering policers
      * Patch #11: Remove the max/min validity checks from
        __mlxsw_sp_trap_policer_set()
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6fe9a949
    • I
      selftests: mlxsw: Add test cases for devlink-trap policers · 9f3e63c5
      Ido Schimmel 提交于
      Add test cases that verify that each registered packet trap policer:
      
      * Honors that imposed limitations of rate and burst size
      * Able to police trapped packets to the specified rate
      * Able to police trapped packets to the specified burst size
      * Able to be unbound from its trap group
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f3e63c5
    • I
      mlxsw: spectrum_trap: Add support for setting of packet trap group parameters · 39defcbb
      Ido Schimmel 提交于
      Implement support for setting of packet trap group parameters by
      invoking the trap_group_init() callback with the new parameters.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      39defcbb
    • I
      mlxsw: spectrum_trap: Switch to use correct packet trap group · d12d8468
      Ido Schimmel 提交于
      Some packet traps are currently exposed to user space as being member of
      "l3_drops" trap group, but internally they are member of a different
      group.
      
      Switch these traps to use the correct group so that they are all subject
      to the same policer, as exposed to user space.
      
      Set the trap priority of packets trapped due to loopback error during
      routing to the lowest priority. Such packets are not routed again by the
      kernel and therefore should not mask other traps (e.g., host miss) that
      should be routed.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d12d8468
    • I
      mlxsw: spectrum_trap: Do not initialize dedicated discard policer · bc82521e
      Ido Schimmel 提交于
      The policer is now initialized as part of the registration with devlink,
      so there is no need to initialize it before the registration.
      
      Remove the initialization.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc82521e
    • I
      mlxsw: spectrum_trap: Add devlink-trap policer support · 13f2e64b
      Ido Schimmel 提交于
      Register supported packet trap policers with devlink and implement
      callbacks to change their parameters and read their counters.
      
      Prevent user space from passing invalid policer parameters down to the
      device by checking their validity and communicating the failure via an
      appropriate extack message.
      
      v2:
      * Remove the max/min validity checks from __mlxsw_sp_trap_policer_set()
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      13f2e64b
    • I
      mlxsw: spectrum_trap: Prepare policers for registration with devlink · 4561705b
      Ido Schimmel 提交于
      Prepare an array of policer IDs to register with devlink and their
      associated parameters.
      
      The array is composed from both policers that are currently bound to
      exposed trap groups and policers that are not bound to any trap group.
      
      v2:
      * Provide max/min rate/burst size when registering policers
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4561705b
    • I
      mlxsw: spectrum: Track used packet trap policer IDs · 03484e49
      Ido Schimmel 提交于
      During initialization the driver configures various packet trap groups
      and binds policers to them.
      
      Currently, most of these groups are not exposed to user space and
      therefore their policers should not be exposed as well. Otherwise, user
      space will be able to alter policer parameters without knowing which
      packet traps are policed by the policer.
      
      Use a bitmap to track the used policer IDs so that these policers will
      not be registered with devlink in a subsequent patch.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      03484e49
    • I
      mlxsw: reg: Extend QPCR register · 2b84d7c3
      Ido Schimmel 提交于
      The QoS Policer Configuration Register (QPCR) is used to configure
      hardware policers. Extend this register with following fields and
      defines which will be used by subsequent patches:
      
      1. Violate counter: reads number of packets dropped by the policer
      2. Clear counter: to ensure we start counting from 0
      3. Rate and burst size limits
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b84d7c3
    • I
      selftests: netdevsim: Add test cases for devlink-trap policers · 5fbff58e
      Ido Schimmel 提交于
      Add test cases for packet trap policer set / show commands as well as
      for the binding of these policers to packet trap groups.
      
      Both good and bad flows are tested for maximum coverage.
      
      v2:
      * Add test case with new 'fail_trap_policer_set' knob
      * Add test case for partially modified trap group
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5fbff58e
    • I
      netdevsim: Add support for setting of packet trap group parameters · 0dc8249a
      Ido Schimmel 提交于
      Add a dummy callback to set trap group parameters. Return an error when
      the 'fail_trap_group_set' debugfs file is set in order to exercise error
      paths and verify that error is propagated to user space when should.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0dc8249a
    • I
      devlink: Allow setting of packet trap group parameters · c064875a
      Ido Schimmel 提交于
      The previous patch allowed device drivers to publish their default
      binding between packet trap policers and packet trap groups. However,
      some users might not be content with this binding and would like to
      change it.
      
      In case user space passed a packet trap policer identifier when setting
      a packet trap group, invoke the appropriate device driver callback and
      pass the new policer identifier.
      
      v2:
      * Check for presence of 'DEVLINK_ATTR_TRAP_POLICER_ID' in
        devlink_trap_group_set() and bail if not present
      * Add extack error message in case trap group was partially modified
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c064875a
    • I
      devlink: Add packet trap group parameters support · f9f54392
      Ido Schimmel 提交于
      Packet trap groups are used to aggregate logically related packet traps.
      Currently, these groups allow user space to batch operations such as
      setting the trap action of all member traps.
      
      In order to prevent the CPU from being overwhelmed by too many trapped
      packets, it is desirable to bind a packet trap policer to these groups.
      For example, to limit all the packets that encountered an exception
      during routing to 10Kpps.
      
      Allow device drivers to bind default packet trap policers to packet trap
      groups when the latter are registered with devlink.
      
      The next patch will enable user space to change this default binding.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9f54392
    • I
      netdevsim: Add devlink-trap policer support · ad188458
      Ido Schimmel 提交于
      Register three dummy packet trap policers with devlink and implement
      callbacks to change their parameters and read their counters.
      
      This will be used later on in the series to test the devlink-trap
      policer infrastructure.
      
      v2:
      * Remove check about burst size being a power of 2 and instead add a
        debugfs knob to fail the operation
      * Provide max/min rate/burst size when registering policers and remove
        the validity checks from nsim_dev_devlink_trap_policer_set()
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad188458
    • I
      Documentation: Add description of packet trap policers · ef7d5c7d
      Ido Schimmel 提交于
      Extend devlink-trap documentation with information about packet trap
      policers.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ef7d5c7d
    • I
      devlink: Add packet trap policers support · 1e8c6619
      Ido Schimmel 提交于
      Devices capable of offloading the kernel's datapath and perform
      functions such as bridging and routing must also be able to send (trap)
      specific packets to the kernel (i.e., the CPU) for processing.
      
      For example, a device acting as a multicast-aware bridge must be able to
      trap IGMP membership reports to the kernel for processing by the bridge
      module.
      
      In most cases, the underlying device is capable of handling packet rates
      that are several orders of magnitude higher compared to those that can
      be handled by the CPU.
      
      Therefore, in order to prevent the underlying device from overwhelming
      the CPU, devices usually include packet trap policers that are able to
      police the trapped packets to rates that can be handled by the CPU.
      
      This patch allows capable device drivers to register their supported
      packet trap policers with devlink. User space can then tune the
      parameters of these policer (currently, rate and burst size) and read
      from the device the number of packets that were dropped by the policer,
      if supported.
      
      Subsequent patches in the series will allow device drivers to create
      default binding between these policers and packet trap groups and allow
      user space to change the binding.
      
      v2:
      * Add 'strict_start_type' in devlink policy
      * Have device drivers provide max/min rate/burst size for each policer.
        Use them to check validity of user provided parameters
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e8c6619
    • A
      Merge branch 'cgroup-bpf_link' · 8596a75f
      Alexei Starovoitov 提交于
      Andrii Nakryiko says:
      
      ====================
      bpf_link abstraction itself was formalized in [0] with justifications for why
      its semantics is a good fit for attaching BPF programs of various types. This
      patch set adds bpf_link-based BPF program attachment mechanism for cgroup BPF
      programs.
      
      Cgroup BPF link is semantically compatible with current BPF_F_ALLOW_MULTI
      semantics of attaching cgroup BPF programs directly. Thus cgroup bpf_link can
      co-exist with legacy BPF program multi-attachment.
      
      bpf_link is destroyed and automatically detached when the last open FD holding
      the reference to bpf_link is closed. This means that by default, when the
      process that created bpf_link exits, attached BPF program will be
      automatically detached due to bpf_link's clean up code. Cgroup bpf_link, like
      any other bpf_link, can be pinned in BPF FS and by those means survive the
      exit of process that created the link. This is useful in many scenarios to
      provide long-living BPF program attachments. Pinning also means that there
      could be many owners of bpf_link through independent FDs.
      
      Additionally, auto-detachmet of cgroup bpf_link is implemented. When cgroup is
      dying it will automatically detach all active bpf_links. This ensures that
      cgroup clean up is not delayed due to active bpf_link even despite no chance
      for any BPF program to be run for a given cgroup. In that sense it's similar
      to existing behavior of dropping refcnt of attached bpf_prog. But in the case
      of bpf_link, bpf_link is not destroyed and is still available to user as long
      as at least one active FD is still open (or if it's pinned in BPF FS).
      
      There are two main cgroup-specific differences between bpf_link-based and
      direct bpf_prog-based attachment.
      
      First, as opposed to direct bpf_prog attachment, cgroup itself doesn't "own"
      bpf_link, which makes it possible to auto-clean up attached bpf_link when user
      process abruptly exits without explicitly detaching BPF program. This makes
      for a safe default behavior proven in BPF tracing program types. But bpf_link
      doesn't bump cgroup->bpf.refcnt as well and because of that doesn't prevent
      cgroup from cleaning up its BPF state.
      
      Second, only owners of bpf_link (those who created bpf_link in the first place
      or obtained a new FD by opening bpf_link from BPF FS) can detach and/or update
      it. This makes sure that no other process can accidentally remove/replace BPF
      program.
      
      This patch set also implements LINK_UPDATE sub-command, which allows to
      replace bpf_link's underlying bpf_prog, similarly to BPF_F_REPLACE flag
      behavior for direct bpf_prog cgroup attachment. Similarly to LINK_CREATE, it
      is supposed to be generic command for different types of bpf_links.
      
        [0] https://lore.kernel.org/bpf/20200228223948.360936-1-andriin@fb.com/
      
      v2->v3:
        - revert back to just MULTI mode (Alexei);
        - fix tinyconfig compilation warning (kbuild test robot);
      
      v1->v2:
        - implement exclusive and overridable exclusive modes (Andrey Ignatov);
        - fix build for !CONFIG_CGROUP_BPF build;
        - add more selftests for non-multi mode and inter-operability;
      ====================
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      8596a75f
    • A
      selftests/bpf: Test FD-based cgroup attachment · 7cccee42
      Andrii Nakryiko 提交于
      Add selftests to exercise FD-based cgroup BPF program attachments and their
      intermixing with legacy cgroup BPF attachments. Auto-detachment and program
      replacement (both unconditional and cmpxchng-like) are tested as well.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200330030001.2312810-5-andriin@fb.com
      7cccee42
    • A
      libbpf: Add support for bpf_link-based cgroup attachment · cc4f864b
      Andrii Nakryiko 提交于
      Add bpf_program__attach_cgroup(), which uses BPF_LINK_CREATE subcommand to
      create an FD-based kernel bpf_link. Also add low-level bpf_link_create() API.
      
      If expected_attach_type is not specified explicitly with
      bpf_program__set_expected_attach_type(), libbpf will try to determine proper
      attach type from BPF program's section definition.
      
      Also add support for bpf_link's underlying BPF program replacement:
        - unconditional through high-level bpf_link__update_program() API;
        - cmpxchg-like with specifying expected current BPF program through
          low-level bpf_link_update() API.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200330030001.2312810-4-andriin@fb.com
      cc4f864b