1. 12 8月, 2019 7 次提交
    • I
      drop_monitor: Allow truncation of dropped packets · 57986617
      Ido Schimmel 提交于
      When sending dropped packets to user space it is not always necessary to
      copy the entire packet as usually only the headers are of interest.
      
      Allow user to specify the truncation length and add the original length
      of the packet as additional metadata to the netlink message.
      
      By default no truncation is performed.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      57986617
    • I
      drop_monitor: Add packet alert mode · ca30707d
      Ido Schimmel 提交于
      So far drop monitor supported only one alert mode in which a summary of
      locations in which packets were recently dropped was sent to user space.
      
      This alert mode is sufficient in order to understand that packets were
      dropped, but lacks information to perform a more detailed analysis.
      
      Add a new alert mode in which the dropped packet itself is passed to
      user space along with metadata: The drop location (as program counter
      and resolved symbol), ingress netdevice and drop timestamp. More
      metadata can be added in the future.
      
      To avoid performing expensive operations in the context in which
      kfree_skb() is invoked (can be hard IRQ), the dropped skb is cloned and
      queued on per-CPU skb drop list. Then, in process context the netlink
      message is allocated, prepared and finally sent to user space.
      
      The per-CPU skb drop list is limited to 1000 skbs to prevent exhausting
      the system's memory. Subsequent patches will make this limit
      configurable and also add a counter that indicates how many skbs were
      tail dropped.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca30707d
    • I
      drop_monitor: Add alert mode operations · 28315f79
      Ido Schimmel 提交于
      The next patch is going to add another alert mode in which the dropped
      packet is notified to user space, instead of only a summary of recent
      drops.
      
      Abstract the differences between the modes by adding alert mode
      operations. The operations are selected based on the currently
      configured mode and associated with the probes and the work item just
      before tracing starts.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28315f79
    • I
      drop_monitor: Require CAP_NET_ADMIN for drop monitor configuration · c5ab9b1c
      Ido Schimmel 提交于
      Currently, the configure command does not do anything but return an
      error. Subsequent patches will enable the command to change various
      configuration options such as alert mode and packet truncation.
      
      Similar to other netlink-based configuration channels, make sure only
      users with the CAP_NET_ADMIN capability set can execute this command.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5ab9b1c
    • I
      drop_monitor: Reset per-CPU data before starting to trace · 44075f56
      Ido Schimmel 提交于
      The function reset_per_cpu_data() allocates and prepares a new skb for
      the summary netlink alert message ('NET_DM_CMD_ALERT'). The new skb is
      stored in the per-CPU 'data' variable and the old is returned.
      
      The function is invoked during module initialization and from the
      workqueue, before an alert is sent. This means that it is possible to
      receive an alert with stale data, if we stopped tracing when the
      hysteresis timer ('data->send_timer') was pending.
      
      Instead of invoking the function during module initialization, invoke it
      just before we start tracing and ensure we get a fresh skb.
      
      This also allows us to remove the calls to initialize the timer and the
      work item from the module initialization path, since both could have
      been triggered by the error paths of reset_per_cpu_data().
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44075f56
    • I
      drop_monitor: Initialize timer and work item upon tracing enable · 70c69274
      Ido Schimmel 提交于
      The timer and work item are currently initialized once during module
      init, but subsequent patches will need to associate different functions
      with the work item, based on the configured alert mode.
      
      Allow subsequent patches to make that change by initializing and
      de-initializing these objects during tracing enable and disable.
      
      This also guarantees that once the request to disable tracing returns,
      no more netlink notifications will be generated.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      70c69274
    • I
      drop_monitor: Split tracing enable / disable to different functions · 7c747838
      Ido Schimmel 提交于
      Subsequent patches will need to enable / disable tracing based on the
      configured alerting mode.
      
      Reduce the nesting level and prepare for the introduction of this
      functionality by splitting the tracing enable / disable operations into
      two different functions.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c747838
  2. 11 8月, 2019 18 次提交
  3. 10 8月, 2019 15 次提交
    • D
      Merge tag 'mlx5-updates-2019-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 38b9e0f6
      David S. Miller 提交于
      Saeed Mahameed says:
      
      ====================
      mlx5-updates-2019-08-09
      
      This series includes update to mlx5 ethernet and core driver:
      
      In first #11 patches, Vlad submits part 2 of 3 part series to allow
      TC flow handling for concurrent execution.
      
      1) TC flow handling for concurrent execution (part 2)
      
      Vald Says:
      ==========
      
      Refactor data structures that are shared between flows in tc.
      Currently, all cls API hardware offloads driver callbacks require caller
      to hold rtnl lock when calling them. Cls API has already been updated to
      update software filters in parallel (on classifiers that support
      unlocked execution), however hardware offloads code still obtains rtnl
      lock before calling driver tc callbacks. This set implements support for
      unlocked execution of tc hairpin, mod_hdr and encap subsystem. The
      changed implemented in these subsystems are very similar in general.
      
      The main difference is that hairpin is accessed through mlx5e_tc_table
      (legacy mode), mod_hdr is accessed through both mlx5e_tc_table and
      mlx5_esw_offload (legacy and switchdev modes) and encap is only accessed
      through mlx5_esw_offload (switchdev mode).
      
      1.1) Hairpin handling and structure mlx5e_hairpin_entry refactored in
      following way:
      
      - Hairpin structure is extended with atomic reference counter. This
        approach allows to lookup of hairpin entry and obtain reference to it
        with hairpin_tbl_lock protection and then continue using the entry
        unlocked (including provisioning to hardware).
      
      - To support unlocked provisioning of hairpin entry to hardware, the entry
        is extended with 'res_ready' completion and is inserted to hairpin_tbl
        before calling the firmware. With this approach any concurrent users that
        attempt to use the same hairpin entry wait for completion first to
        prevent access to entries that are not fully initialized.
      
      - Hairpin entry is extended with new flows_lock spinlock to protect the
        list when multiple concurrent tc instances update flows attached to
        the same hairpin entry.
      
      1.2) Modify header handling code and structure mlx5e_mod_hdr_entry
      are refactored in the following way:
      
      - Mod_hdr structure is extended with atomic reference counter. This
        approach allows to lookup of mod_hdr entry and obtain reference to it
        with mod_hdr_tbl_lock protection and then continue using the entry
        unlocked (including provisioning to hardware).
      
      - To support unlocked provisioning of mod_hdr entry to hardware, the entry
        is extended with 'res_ready' completion and is inserted to mod_hdr_tbl
        before calling the firmware. With this approach any concurrent users that
        attempt to use the same mod_hdr entry wait for completion first to
        prevent access to entries that are not fully initialized.
      
      - Mod_Hdr entry is extended with new flows_lock spinlock to protect the
        list when multiple concurrent tc instances update flows attached to
        the same mod_hdr entry.
      
      1.3) Encapsulation handling code and Structure mlx5e_encap_entry
      are refactored in the following way:
      
      - encap structure is extended with atomic reference counter. This
        approach allows to lookup of encap entry and obtain reference to it
        with encap_tbl_lock protection and then continue using the entry
        unlocked (including provisioning to hardware).
      
      - To support unlocked provisioning of encap entry to hardware, the entry is
        extended with 'res_ready' completion and is inserted to encap_tbl before
        calling the firmware. With this approach any concurrent users that
        attempt to use the same encap entry wait for completion first to prevent
        access to entries that are not fully initialized.
      
      - As a difference from approach used to refactor hairpin and mod_hdr,
        encap entry is not extended with any per-entry fine-grained lock.
        Instead, encap_table_lock is used to synchronize all operations on
        encap table and instances of mlx5e_encap_entry. This is necessary
        because single flow can be attached to multiple encap entries
        simultaneously. During new flow creation or neigh update event all of
        encaps that flow is attached to must be accessed together as in atomic
        manner, which makes usage of per-entry lock infeasible.
      
      - Encap entry is extended with new flows_lock spinlock to protect the
        list when multiple concurrent tc instances update flows attached to
        the same encap entry.
      
      ==========
      
      3) Parav improves the way port representors report their parent ID and
      port index.
      
      4) Use refcount_t for refcount in vxlan data base from  Chuhong Yuan
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38b9e0f6
    • R
      62ad42ec
    • D
      selftests: Fix detection of nettest command in fcnal-test · f887427b
      David Ahern 提交于
      Most of the tests run by fcnal-test.sh relies on the nettest command.
      Rather than trying to cover all of the individual tests, check for the
      binary only at the beginning.
      
      Also removes the need for log_error which is undefined.
      
      Fixes: 6f9d5cac ("selftests: Setup for functional tests for fib and socket lookups")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f887427b
    • C
      net/mlx5e: Use refcount_t for refcount · b51c225e
      Chuhong Yuan 提交于
      refcount_t is better for reference counters since its
      implementation can prevent overflows.
      So convert atomic_t ref counters to refcount_t.
      Signed-off-by: NChuhong Yuan <hslester96@gmail.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      b51c225e
    • P
      net/mlx5e: Use vhca_id in generating representor port_index · c938451f
      Parav Pandit 提交于
      It is desired to use unique port indices when multiple pci devices'
      devlink instance have the same switch-id.
      
      Make use of vhca-id to generate such unique devlink port indices.
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      c938451f
    • P
      net/mlx5e: Simplify querying port representor parent id · 724ee179
      Parav Pandit 提交于
      System image GUID doesn't depend on eswitch switchdev mode.
      
      Hence, remove the check which simplifies the code.
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Reviewed-by: NVu Pham <vuhuong@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      724ee179
    • P
      net/mlx5: E-switch, Removed unused hwid · ef2e4094
      Parav Pandit 提交于
      Currently mlx5_eswitch_rep stores same hw ID for all representors.
      However it is never used from this structure.
      It is always used from mlx5_vport.
      
      Hence, remove unused field.
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Reviewed-by: NVu Pham <vuhuong@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      ef2e4094
    • V
      net/mlx5e: Allow concurrent creation of encap entries · d589e785
      Vlad Buslov 提交于
      Encap entries creation is fully synchronized by encap_tbl_lock. In order to
      allow concurrent allocation of hardware resources used to offload
      encapsulation, extend mlx5e_encap_entry with 'res_ready' completion. Move
      call to mlx5e_tc_tun_create_header_ipv{4|6}() out of encap_tbl_lock
      critical section. Modify code that attaches new flows to existing encap to
      wait for 'res_ready' completion before using the entry. Insert encap entry
      to table before provisioning it to hardware and modify all users of the
      encap table to verify that encap was fully initialized by checking
      completion result for non-zero value (and to wait for 'res_ready'
      completion, if necessary).
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: NRoi Dayan <roid@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      d589e785
    • V
      net/mlx5e: Protect encap hash table with mutex · 61086f39
      Vlad Buslov 提交于
      To remove dependency on rtnl lock, protect encap hash table from concurrent
      modifications with new "encap_tbl_lock" mutex. Use the mutex to protect
      internal encap entry state from concurrent modification. This is necessary
      because a flow can be attached to multiple encap entries simultaneously,
      which significantly complicates using finer grained per-entry lock.
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: NRoi Dayan <roid@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      61086f39
    • V
      net/mlx5e: Extend encap entry with reference counter · 948993f2
      Vlad Buslov 提交于
      List of flows attached to encap entry is used as implicit reference
      counter (encap entry is deallocated when list becomes free) and as a
      mechanism to obtain encap entry that flow is attached to (through list
      head). This is not safe when concurrent modification of list of flows
      attached to encap entry is possible. Proper atomic reference counter is
      required to support concurrent access.
      
      As a preparation for extending encap with reference counting, extract code
      that lookups and deletes encap entry into standalone put/get helpers. In
      order to remove this dependency on external locking, extend encap entry
      with reference counter to manage its lifetime and extend flow structure
      with direct pointer to encap entry that flow is attached to.
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: NJianbo Liu <jianbol@mellanox.com>
      Reviewed-by: NRoi Dayan <roid@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      948993f2
    • V
      net/mlx5e: Allow concurrent creation of mod_hdr entries · a734d007
      Vlad Buslov 提交于
      Mod_hdr entries creation is fully synchronized by mod_hdr_tbl->lock. In
      order to allow concurrent allocation of hardware resources used to offload
      header rewrite, extend mlx5e_mod_hdr_entry with 'res_ready' completion.
      Move call to mlx5_modify_header_alloc() out of mod_hdr_tbl->lock critical
      section. Modify code that attaches new flows to existing mh to wait for
      'res_ready' completion before using the entry. Insert mh to mod_hdr table
      before provisioning it to hardware and modify all users of mod_hdr table to
      verify that mh was fully initialized by checking completion result for
      negative value (and to wait for 'res_ready' completion, if necessary).
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: NRoi Dayan <roid@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      a734d007
    • V
      net/mlx5e: Protect mod_hdr hash table with mutex · d2faae25
      Vlad Buslov 提交于
      To remove dependency on rtnl lock, protect mod_hdr hash table from
      concurrent modifications with new mutex.
      
      Implement helper function to get flow namespace to prevent code
      duplication.
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: NRoi Dayan <roid@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      d2faae25
    • V
      net/mlx5e: Protect mod header entry flows list with spinlock · 83a52f0d
      Vlad Buslov 提交于
      To remove dependency on rtnl lock, extend mod header entry with spinlock
      and use it to protect list of flows attached to mod header entry from
      concurrent modifications.
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: NJianbo Liu <jianbol@mellanox.com>
      Reviewed-by: NRoi Dayan <roid@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      83a52f0d
    • V
      net/mlx5e: Extend mod header entry with reference counter · dd58edc3
      Vlad Buslov 提交于
      List of flows attached to mod header entry is used as implicit reference
      counter (mod header entry is deallocated when list becomes free) and as a
      mechanism to obtain mod header entry that flow is attached to (through list
      head). This is not safe when concurrent modification of list of flows
      attached to mod header entry is possible. Proper atomic reference counter
      is required to support concurrent access.
      
      As a preparation for extending mod header with reference counting, extract
      code that lookups and deletes mod header entry into standalone put/get
      helpers. In order to remove this dependency on external locking, extend mod
      header entry with reference counter to manage its lifetime and extend flow
      structure with direct pointer to mod header entry that flow is attached to.
      
      To remove code duplication between legacy and switchdev mode
      implementations that both support mod_hdr functionality, store mod_hdr
      table in dedicated structure used by both fdb and kernel namespaces. New
      table structure is extended with table lock by one of the following patches
      in this series. Implement helper function to get correct mod_hdr table
      depending on flow namespace.
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: NJianbo Liu <jianbol@mellanox.com>
      Reviewed-by: NRoi Dayan <roid@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      dd58edc3
    • V
      net/mlx5e: Allow concurrent creation of hairpin entries · db76ca24
      Vlad Buslov 提交于
      Hairpin entries creation is fully synchronized by hairpin_tbl_lock. In
      order to allow concurrent initialization of mlx5e_hairpin structure
      instances and provisioning of hairpin entries to hardware, extend
      mlx5e_hairpin_entry with 'res_ready' completion. Move call to
      mlx5e_hairpin_create() out of hairpin_tbl_lock critical section. Modify
      code that attaches new flows to existing hpe to wait for 'res_ready'
      completion before using the hpe. Insert hpe to hairpin table before
      provisioning it to hardware and modify all users of hairpin table to verify
      that hpe was fully initialized by checking hpe->hp pointer (and to wait for
      'res_ready' completion, if necessary).
      
      Modify dead peer update event handling function to save hpe's to temporary
      list with their reference counter incremented. Wait for completion of hpe's
      in temporary list and update their 'peer_gone' flag outside of
      hairpin_tbl_lock critical section.
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: NRoi Dayan <roid@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      db76ca24