1. 12 8月, 2019 15 次提交
    • H
      net: phy: add phy_modify_paged_changed · bf22b343
      Heiner Kallweit 提交于
      Add helper function phy_modify_paged_changed, behavios is the same
      as for phy_modify_changed.
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf22b343
    • H
      net: phy: prepare phylib to deal with PHY's extending Clause 22 · f4069cd7
      Heiner Kallweit 提交于
      The integrated PHY in 2.5Gbps chip RTL8125 is the first (known to me)
      PHY that uses standard Clause 22 for all modes up to 1Gbps and adds
      2.5Gbps control using vendor-specific registers. To use phylib for
      the standard part little extensions are needed:
      - Move most of genphy_config_aneg to a new function
        __genphy_config_aneg that takes a parameter whether restarting
        auto-negotiation is needed (depending on whether content of
        vendor-specific advertisement register changed).
      - Don't clear phydev->lp_advertising in genphy_read_status so that
        we can set non-C22 mode flags before.
      
      Basically both changes mimic the behavior of the equivalent Clause 45
      functions.
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4069cd7
    • H
      net: phy: simplify genphy_config_advert by using the linkmode_adv_to_xxx_t functions · 3eef8689
      Heiner Kallweit 提交于
      Using linkmode_adv_to_mii_adv_t and linkmode_adv_to_mii_ctrl1000_t
      allows to simplify the code. In addition avoiding the conversion to
      the legacy u32 advertisement format allows to remove the warning.
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Suggested-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3eef8689
    • J
      netdevsim: register couple of devlink params · 150e8f8a
      Jiri Pirko 提交于
      Register couple of devlink params, one generic, one driver-specific.
      Make the values available over debugfs.
      
      Example:
      $ echo "111" > /sys/bus/netdevsim/new_device
      $ devlink dev param
      netdevsim/netdevsim111:
        name max_macs type generic
          values:
            cmode driverinit value 32
        name test1 type driver-specific
          values:
            cmode driverinit value true
      $ cat /sys/kernel/debug/netdevsim/netdevsim111/max_macs
      32
      $ cat /sys/kernel/debug/netdevsim/netdevsim111/test1
      Y
      $ devlink dev param set netdevsim/netdevsim111 name max_macs cmode driverinit value 16
      $ devlink dev param set netdevsim/netdevsim111 name test1 cmode driverinit value false
      $ devlink dev reload netdevsim/netdevsim111
      $ cat /sys/kernel/debug/netdevsim/netdevsim111/max_macs
      16
      $ cat /sys/kernel/debug/netdevsim/netdevsim111/test1
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      150e8f8a
    • D
      Merge branch 'drop_monitor-Capture-dropped-packets-and-metadata' · 6e5ee483
      David S. Miller 提交于
      Ido Schimmel says:
      
      ====================
      drop_monitor: Capture dropped packets and metadata
      
      So far drop monitor supported only one mode of operation in which a
      summary of recent packet drops is periodically sent to user space as a
      netlink event. The event only includes the drop location (program
      counter) and number of drops in the last interval.
      
      While this mode of operation allows one to understand if the system is
      dropping packets, it is not sufficient if a more detailed analysis is
      required. Both the packet itself and related metadata are missing.
      
      This patchset extends drop monitor with another mode of operation where
      the packet - potentially truncated - and metadata (e.g., drop location,
      timestamp, netdev) are sent to user space as a netlink event. Thanks to
      the extensible nature of netlink, more metadata can be added in the
      future.
      
      To avoid performing expensive operations in the context in which
      kfree_skb() is called, the dropped skbs are cloned and queued on per-CPU
      skb drop list. The list is then processed in process context (using a
      workqueue), where the netlink messages are allocated, prepared and
      finally sent to user space.
      
      A follow-up patchset will integrate drop monitor with devlink and allow
      the latter to call into drop monitor to report hardware drops. In the
      future, XDP drops can be added as well, thereby making drop monitor the
      go-to netlink channel for diagnosing all packet drops.
      
      Example usage with patched dropwatch [1] can be found here [2]. Example
      dissection of drop monitor netlink events with patched wireshark [3] can
      be found here [4]. I will submit both changes upstream after the kernel
      changes are accepted. Another change worth making is adding a dropmon
      pseudo interface to libpcap, similar to the nflog interface [5]. This
      will allow users to specifically listen on dropmon traffic instead of
      capturing all netlink packets via the nlmon netdev.
      
      Patches #1-#5 prepare the code towards the actual changes in later
      patches.
      
      Patch #6 adds another mode of operation to drop monitor in which the
      dropped packet itself is notified to user space along with metadata.
      
      Patch #7 allows users to truncate reported packets to a specific length,
      in case only the headers are of interest. The original length of the
      packet is added as metadata to the netlink notification.
      
      Patch #8 allows user to query the current configuration of drop monitor
      (e.g., alert mode, truncation length).
      
      Patches #9-#10 allow users to tune the length of the per-CPU skb drop
      list according to their needs.
      
      Changes since v1 [6]:
      * Add skb protocol as metadata. This allows user space to correctly
        dissect the packet instead of blindly assuming it is an Ethernet
        packet
      
      Changes since RFC [7]:
      * Limit the length of the per-CPU skb drop list and make it configurable
      * Do not use the hysteresis timer in packet alert mode
      * Introduce alert mode operations in a separate patch and only then
        introduce the new alert mode
      * Use 'skb->skb_iif' instead of 'skb->dev' because the latter is inside
        a union with 'dev_scratch' and therefore not guaranteed to point to a
        valid netdev
      * Return '-EBUSY' instead of '-EOPNOTSUPP' when trying to configure drop
        monitor while it is monitoring
      * Did not change schedule_work() in favor of schedule_work_on() as I did
        not observe a change in number of tail drops
      
      [1] https://github.com/idosch/dropwatch/tree/packet-mode
      [2] https://gist.github.com/idosch/3d524b887e16bc11b4b19e25c23dcc23#file-gistfile1-txt
      [3] https://github.com/idosch/wireshark/tree/drop-monitor-v2
      [4] https://gist.github.com/idosch/3d524b887e16bc11b4b19e25c23dcc23#file-gistfile2-txt
      [5] https://github.com/the-tcpdump-group/libpcap/blob/master/pcap-netfilter-linux.c
      [6] https://patchwork.ozlabs.org/cover/1143443/
      [7] https://patchwork.ozlabs.org/cover/1135226/
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e5ee483
    • I
      drop_monitor: Expose tail drop counter · e9feb580
      Ido Schimmel 提交于
      Previous patch made the length of the per-CPU skb drop list
      configurable. Expose a counter that shows how many packets could not be
      enqueued to this list.
      
      This allows users determine the desired queue length.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e9feb580
    • I
      drop_monitor: Make drop queue length configurable · 30328d46
      Ido Schimmel 提交于
      In packet alert mode, each CPU holds a list of dropped skbs that need to
      be processed in process context and sent to user space. To avoid
      exhausting the system's memory the maximum length of this queue is
      currently set to 1000.
      
      Allow users to tune the length of this queue according to their needs.
      The configured length is reported to user space when drop monitor
      configuration is queried.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30328d46
    • I
      drop_monitor: Add a command to query current configuration · 444be061
      Ido Schimmel 提交于
      Users should be able to query the current configuration of drop monitor
      before they start using it. Add a command to query the existing
      configuration which currently consists of alert mode and packet
      truncation length.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      444be061
    • I
      drop_monitor: Allow truncation of dropped packets · 57986617
      Ido Schimmel 提交于
      When sending dropped packets to user space it is not always necessary to
      copy the entire packet as usually only the headers are of interest.
      
      Allow user to specify the truncation length and add the original length
      of the packet as additional metadata to the netlink message.
      
      By default no truncation is performed.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      57986617
    • I
      drop_monitor: Add packet alert mode · ca30707d
      Ido Schimmel 提交于
      So far drop monitor supported only one alert mode in which a summary of
      locations in which packets were recently dropped was sent to user space.
      
      This alert mode is sufficient in order to understand that packets were
      dropped, but lacks information to perform a more detailed analysis.
      
      Add a new alert mode in which the dropped packet itself is passed to
      user space along with metadata: The drop location (as program counter
      and resolved symbol), ingress netdevice and drop timestamp. More
      metadata can be added in the future.
      
      To avoid performing expensive operations in the context in which
      kfree_skb() is invoked (can be hard IRQ), the dropped skb is cloned and
      queued on per-CPU skb drop list. Then, in process context the netlink
      message is allocated, prepared and finally sent to user space.
      
      The per-CPU skb drop list is limited to 1000 skbs to prevent exhausting
      the system's memory. Subsequent patches will make this limit
      configurable and also add a counter that indicates how many skbs were
      tail dropped.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca30707d
    • I
      drop_monitor: Add alert mode operations · 28315f79
      Ido Schimmel 提交于
      The next patch is going to add another alert mode in which the dropped
      packet is notified to user space, instead of only a summary of recent
      drops.
      
      Abstract the differences between the modes by adding alert mode
      operations. The operations are selected based on the currently
      configured mode and associated with the probes and the work item just
      before tracing starts.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28315f79
    • I
      drop_monitor: Require CAP_NET_ADMIN for drop monitor configuration · c5ab9b1c
      Ido Schimmel 提交于
      Currently, the configure command does not do anything but return an
      error. Subsequent patches will enable the command to change various
      configuration options such as alert mode and packet truncation.
      
      Similar to other netlink-based configuration channels, make sure only
      users with the CAP_NET_ADMIN capability set can execute this command.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5ab9b1c
    • I
      drop_monitor: Reset per-CPU data before starting to trace · 44075f56
      Ido Schimmel 提交于
      The function reset_per_cpu_data() allocates and prepares a new skb for
      the summary netlink alert message ('NET_DM_CMD_ALERT'). The new skb is
      stored in the per-CPU 'data' variable and the old is returned.
      
      The function is invoked during module initialization and from the
      workqueue, before an alert is sent. This means that it is possible to
      receive an alert with stale data, if we stopped tracing when the
      hysteresis timer ('data->send_timer') was pending.
      
      Instead of invoking the function during module initialization, invoke it
      just before we start tracing and ensure we get a fresh skb.
      
      This also allows us to remove the calls to initialize the timer and the
      work item from the module initialization path, since both could have
      been triggered by the error paths of reset_per_cpu_data().
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44075f56
    • I
      drop_monitor: Initialize timer and work item upon tracing enable · 70c69274
      Ido Schimmel 提交于
      The timer and work item are currently initialized once during module
      init, but subsequent patches will need to associate different functions
      with the work item, based on the configured alert mode.
      
      Allow subsequent patches to make that change by initializing and
      de-initializing these objects during tracing enable and disable.
      
      This also guarantees that once the request to disable tracing returns,
      no more netlink notifications will be generated.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      70c69274
    • I
      drop_monitor: Split tracing enable / disable to different functions · 7c747838
      Ido Schimmel 提交于
      Subsequent patches will need to enable / disable tracing based on the
      configured alerting mode.
      
      Reduce the nesting level and prepare for the introduction of this
      functionality by splitting the tracing enable / disable operations into
      two different functions.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c747838
  2. 11 8月, 2019 18 次提交
  3. 10 8月, 2019 7 次提交
    • D
      Merge tag 'mlx5-updates-2019-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 38b9e0f6
      David S. Miller 提交于
      Saeed Mahameed says:
      
      ====================
      mlx5-updates-2019-08-09
      
      This series includes update to mlx5 ethernet and core driver:
      
      In first #11 patches, Vlad submits part 2 of 3 part series to allow
      TC flow handling for concurrent execution.
      
      1) TC flow handling for concurrent execution (part 2)
      
      Vald Says:
      ==========
      
      Refactor data structures that are shared between flows in tc.
      Currently, all cls API hardware offloads driver callbacks require caller
      to hold rtnl lock when calling them. Cls API has already been updated to
      update software filters in parallel (on classifiers that support
      unlocked execution), however hardware offloads code still obtains rtnl
      lock before calling driver tc callbacks. This set implements support for
      unlocked execution of tc hairpin, mod_hdr and encap subsystem. The
      changed implemented in these subsystems are very similar in general.
      
      The main difference is that hairpin is accessed through mlx5e_tc_table
      (legacy mode), mod_hdr is accessed through both mlx5e_tc_table and
      mlx5_esw_offload (legacy and switchdev modes) and encap is only accessed
      through mlx5_esw_offload (switchdev mode).
      
      1.1) Hairpin handling and structure mlx5e_hairpin_entry refactored in
      following way:
      
      - Hairpin structure is extended with atomic reference counter. This
        approach allows to lookup of hairpin entry and obtain reference to it
        with hairpin_tbl_lock protection and then continue using the entry
        unlocked (including provisioning to hardware).
      
      - To support unlocked provisioning of hairpin entry to hardware, the entry
        is extended with 'res_ready' completion and is inserted to hairpin_tbl
        before calling the firmware. With this approach any concurrent users that
        attempt to use the same hairpin entry wait for completion first to
        prevent access to entries that are not fully initialized.
      
      - Hairpin entry is extended with new flows_lock spinlock to protect the
        list when multiple concurrent tc instances update flows attached to
        the same hairpin entry.
      
      1.2) Modify header handling code and structure mlx5e_mod_hdr_entry
      are refactored in the following way:
      
      - Mod_hdr structure is extended with atomic reference counter. This
        approach allows to lookup of mod_hdr entry and obtain reference to it
        with mod_hdr_tbl_lock protection and then continue using the entry
        unlocked (including provisioning to hardware).
      
      - To support unlocked provisioning of mod_hdr entry to hardware, the entry
        is extended with 'res_ready' completion and is inserted to mod_hdr_tbl
        before calling the firmware. With this approach any concurrent users that
        attempt to use the same mod_hdr entry wait for completion first to
        prevent access to entries that are not fully initialized.
      
      - Mod_Hdr entry is extended with new flows_lock spinlock to protect the
        list when multiple concurrent tc instances update flows attached to
        the same mod_hdr entry.
      
      1.3) Encapsulation handling code and Structure mlx5e_encap_entry
      are refactored in the following way:
      
      - encap structure is extended with atomic reference counter. This
        approach allows to lookup of encap entry and obtain reference to it
        with encap_tbl_lock protection and then continue using the entry
        unlocked (including provisioning to hardware).
      
      - To support unlocked provisioning of encap entry to hardware, the entry is
        extended with 'res_ready' completion and is inserted to encap_tbl before
        calling the firmware. With this approach any concurrent users that
        attempt to use the same encap entry wait for completion first to prevent
        access to entries that are not fully initialized.
      
      - As a difference from approach used to refactor hairpin and mod_hdr,
        encap entry is not extended with any per-entry fine-grained lock.
        Instead, encap_table_lock is used to synchronize all operations on
        encap table and instances of mlx5e_encap_entry. This is necessary
        because single flow can be attached to multiple encap entries
        simultaneously. During new flow creation or neigh update event all of
        encaps that flow is attached to must be accessed together as in atomic
        manner, which makes usage of per-entry lock infeasible.
      
      - Encap entry is extended with new flows_lock spinlock to protect the
        list when multiple concurrent tc instances update flows attached to
        the same encap entry.
      
      ==========
      
      3) Parav improves the way port representors report their parent ID and
      port index.
      
      4) Use refcount_t for refcount in vxlan data base from  Chuhong Yuan
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38b9e0f6
    • R
      62ad42ec
    • D
      selftests: Fix detection of nettest command in fcnal-test · f887427b
      David Ahern 提交于
      Most of the tests run by fcnal-test.sh relies on the nettest command.
      Rather than trying to cover all of the individual tests, check for the
      binary only at the beginning.
      
      Also removes the need for log_error which is undefined.
      
      Fixes: 6f9d5cac ("selftests: Setup for functional tests for fib and socket lookups")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f887427b
    • C
      net/mlx5e: Use refcount_t for refcount · b51c225e
      Chuhong Yuan 提交于
      refcount_t is better for reference counters since its
      implementation can prevent overflows.
      So convert atomic_t ref counters to refcount_t.
      Signed-off-by: NChuhong Yuan <hslester96@gmail.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      b51c225e
    • P
      net/mlx5e: Use vhca_id in generating representor port_index · c938451f
      Parav Pandit 提交于
      It is desired to use unique port indices when multiple pci devices'
      devlink instance have the same switch-id.
      
      Make use of vhca-id to generate such unique devlink port indices.
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      c938451f
    • P
      net/mlx5e: Simplify querying port representor parent id · 724ee179
      Parav Pandit 提交于
      System image GUID doesn't depend on eswitch switchdev mode.
      
      Hence, remove the check which simplifies the code.
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Reviewed-by: NVu Pham <vuhuong@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      724ee179
    • P
      net/mlx5: E-switch, Removed unused hwid · ef2e4094
      Parav Pandit 提交于
      Currently mlx5_eswitch_rep stores same hw ID for all representors.
      However it is never used from this structure.
      It is always used from mlx5_vport.
      
      Hence, remove unused field.
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Reviewed-by: NVu Pham <vuhuong@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      ef2e4094