1. 20 12月, 2019 22 次提交
  2. 19 12月, 2019 18 次提交
    • J
      net: stmmac: tc: Fix TAPRIO division operation · a1ec57c0
      Jose Abreu 提交于
      For ARCHs that don't support 64 bits division we need to use the
      helpers.
      
      Fixes: b60189e0 ("net: stmmac: Integrate EST with TAPRIO scheduler API")
      Signed-off-by: NJose Abreu <Jose.Abreu@synopsys.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1ec57c0
    • D
      Merge branch 'ETS-qdisc' · 6bff0017
      David S. Miller 提交于
      Petr Machata says:
      
      ====================
      Add a new Qdisc, ETS
      
      The IEEE standard 802.1Qaz (and 802.1Q-2014) specifies four principal
      transmission selection algorithms: strict priority, credit-based shaper,
      ETS (bandwidth sharing), and vendor-specific. All these have their
      corresponding knobs in DCB. But DCB does not have interfaces to configure
      RED and ECN, unlike Qdiscs.
      
      In the Qdisc land, strict priority is implemented by PRIO. Credit-based
      transmission selection algorithm can then be modeled by having e.g. TBF or
      CBS Qdisc below some of the PRIO bands. ETS would then be modeled by
      placing a DRR Qdisc under the last PRIO band.
      
      The problem with this approach is that DRR on its own, as well as the
      combination of PRIO and DRR, are tricky to configure and tricky to offload
      to 802.1Qaz-compliant hardware. This is due to several reasons:
      
      - As any classful Qdisc, DRR supports adding classifiers to decide in which
        class to enqueue packets. Unlike PRIO, there's however no fallback in the
        form of priomap. A way to achieve classification based on packet priority
        is e.g. like this:
      
          # tc filter add dev swp1 root handle 1: \
      		basic match 'meta(priority eq 0)' flowid 1:10
      
        Expressing the priomap in this manner however forces drivers to deep dive
        into the classifier block to parse the individual rules.
      
        A possible solution would be to extend the classes with a "defmap" a la
        split / defmap mechanism of CBQ, and introduce this as a last resort
        classification. However, unlike priomap, this doesn't have the guarantee
        of covering all priorities. Traffic whose priority is not covered is
        dropped by DRR as unclassified. But ASICs tend to implement dropping in
        the ACL block, not in scheduling pipelines. The need to treat these
        configurations correctly (if only to decide to not offload at all)
        complicates a driver.
      
        It's not clear how to retrofit priomap with all its benefits to DRR
        without changing it beyond recognition.
      
      - The interplay between PRIO and DRR is also causing problems. 802.1Qaz has
        all ETS TCs as a last resort. Switch ASICs that support ETS at all are
        likely to handle ETS traffic this way as well. However, the Linux model
        is more generic, allowing the DRR block in any band. Drivers would need
        to be careful to handle this case correctly, otherwise the offloaded
        model might not match the slow-path one.
      
        In a similar vein, PRIO and DRR need to agree on the list of priorities
        assigned to DRR. This is doubly problematic--the user needs to take care
        to keep the two in sync, and the driver needs to watch for any holes in
        DRR coverage and treat the traffic correctly, as discussed above.
      
        Note that at the time that DRR Qdisc is added, it has no classes, and
        thus any priorities assigned to that PRIO band are not covered. Thus this
        case is surprisingly rather common, and needs to be handled gracefully by
        the driver.
      
      - Similarly due to DRR flexibility, when a Qdisc (such as RED) is attached
        below it, it is not immediately clear which TC the class represents. This
        is unlike PRIO with its straightforward classid scheme. When DRR is
        combined with PRIO, the relationship between classes and TCs gets even
        more murky.
      
        This is a problem for users as well: the TC mapping is rather important
        for (devlink) shared buffer configuration and (ethtool) counters.
      
      So instead, this patch set introduces a new Qdisc, which is based on
      802.1Qaz wording. It is PRIO-like in how it is configured, meaning one
      needs to specify how many bands there are, how many are strict and how many
      are ETS, quanta for the latter, and priomap.
      
      The new Qdisc operates like the PRIO / DRR combo would when configured as
      per the standard. The strict classes, if any, are tried for traffic first.
      When there's no traffic in any of the strict queues, the ETS ones (if any)
      are treated in the same way as in DRR.
      
      The chosen interface makes the overall system both reasonably easy to
      configure, and reasonably easy to offload. The extra code to support ETS in
      mlxsw (which already supports PRIO) is about 150 lines, of which perhaps 20
      lines is bona fide new business logic.
      
      Credit-based shaping transmission selection algorithm can be configured by
      adding a CBS Qdisc under one of the strict bands (e.g. TBF can be used to a
      similar effect as well). As a non-work-conserving Qdisc, CBS can't be
      hooked under the ETS bands. This is detected and handled identically to DRR
      Qdisc at runtime. Note that offloading CBS is not subject of this patchset.
      
      The patchset proceeds in four stages:
      
      - Patches #1-#3 are cleanups.
      - Patches #4 and #5 contain the new Qdisc.
      - Patches #6 and #7 update mlxsw to offload the new Qdisc.
      - Patches #8-#10 add selftests for ETS.
      
      Examples:
      
      - Add a Qdisc with 6 bands, 3 strict and 3 ETS with 45%-30%-25% weights:
      
          # tc qdisc add dev swp1 root handle 1: \
      	ets strict 3 quanta 4500 3000 2500 priomap 0 1 1 1 2 3 4 5
          # tc qdisc sh dev swp1
          qdisc ets 1: root refcnt 2 bands 6 strict 3 quanta 4500 3000 2500 priomap 0 1 1 1 2 3 4 5 5 5 5 5 5 5 5 5
      
      - Tweak quantum of one of the classes of the previous Qdisc:
      
          # tc class ch dev swp1 classid 1:4 ets quantum 1000
          # tc qdisc sh dev swp1
          qdisc ets 1: root refcnt 2 bands 6 strict 3 quanta 1000 3000 2500 priomap 0 1 1 1 2 3 4 5 5 5 5 5 5 5 5 5
          # tc class ch dev swp1 classid 1:3 ets quantum 1000
          Error: Strict bands do not have a configurable quantum.
      
      - Purely strict Qdisc with 1:1 mapping between priorities and TCs:
      
          # tc qdisc add dev swp1 root handle 1: \
      	ets strict 8 priomap 7 6 5 4 3 2 1 0
          # tc qdisc sh dev swp1
          qdisc ets 1: root refcnt 2 bands 8 strict 8 priomap 7 6 5 4 3 2 1 0 7 7 7 7 7 7 7 7
      
      - Use "bands" to specify number of bands explicitly. Underspecified bands
        are implicitly ETS and their quantum is taken from MTU. The following
        thus gives each band the same weight:
      
          # tc qdisc add dev swp1 root handle 1: \
      	ets bands 8 priomap 7 6 5 4 3 2 1 0
          # tc qdisc sh dev swp1
          qdisc ets 1: root refcnt 2 bands 8 quanta 1514 1514 1514 1514 1514 1514 1514 1514 priomap 7 6 5 4 3 2 1 0 7 7 7 7 7 7 7 7
      
      v2:
      - This addresses points raised by David Miller.
      - Patch #4:
          - sch_ets.c: Add a comment with description of the Qdisc and the
            dequeuing algorithm.
          - Kconfig: Add a high-level description to the help blurb.
      
      v1:
      - No changes, first upstream submission after RFC.
      
      v3 (internal):
      - This addresses review from Jiri Pirko.
      - Patch #3:
          - Rename to _HR_ instead of to _HIERARCHY_.
      - Patch #4:
          - pkt_sched.h: Keep all the TCA_ETS_ constants in one enum.
          - pkt_sched.h: Rename TCA_ETS_BANDS to _NBANDS, _STRICT to _NSTRICT,
            _BAND_QUANTUM to _QUANTA_BAND and _PMAP_BAND to _PRIOMAP_BAND.
          - sch_ets.c: Update to reflect the above changes. Add a new policy,
            ets_class_policy, which is used when parsing class changes.
            Currently that policy is the same as the quanta policy, but that
            might change.
          - sch_ets.c: Move MTU handling from ets_quantum_parse() to the one
            caller that makes use of it.
          - sch_ets.c: ets_qdisc_priomap_parse(): WARN_ON_ONCE on invalid
            attribute instead of returning an extack.
      - Patch #6:
          - __mlxsw_sp_qdisc_ets_replace(): Pass the weights argument to this
            function in this patch already. Drop the weight computation.
          - mlxsw_sp_qdisc_prio_replace(): Rename "quanta" to "zeroes" and
            pass for the abovementioned "weights".
          - mlxsw_sp_qdisc_prio_graft(): Convert to a wrapper around
            __mlxsw_sp_qdisc_ets_graft(), instead of invoking the latter
            directly from mlxsw_sp_setup_tc_prio().
          - Update to follow the _HIERARCHY_ -> _HR_ renaming.
      - Patch #7:
          - __mlxsw_sp_qdisc_ets_replace(): The "weights" argument passing and
            weight computation removal are now done in a previous patch.
          - mlxsw_sp_setup_tc_ets(): Drop case TC_ETS_REPLACE, which is handled
            earlier in the function.
      - Patch #3 (iproute2):
          - Add an example output to the commit message.
          - tc-ets.8: Fix output of two examples.
          - tc-ets.8: Describe default values of "bands", "quanta".
          - q_ets.c: A number of fixes in error messages.
          - q_ets.c: Comment formatting: /*padding*/ -> /* padding */
          - q_ets.c: parse_nbands: Move duplicate checking to callers.
          - q_ets.c: Don't accept both "quantum" and "quanta" as equivalent.
      
      v2 (internal):
      - This addresses review from Ido Schimmel and comments from Alexander
        Kushnarov.
      - Patch #2:
          - s/coment/comment in the commit message.
      - Patch #4:
          - sch_ets: ets_class_is_strict(), ets_class_id(): Constify an argument
          - ets_class_find(): RXTify
      - Patch #3 (iproute2):
          - tc-ets.8: some spelling fixes
          - tc-ets.8: add another example
          - tc.8: add an ETS to "CLASSFUL QDISCS" section
      
      v1 (internal):
      - This addresses RFC reviews from Ido Schimmel and Roman Mashak, bugs found
        by Alexander Petrovskiy and myself, and other improvements.
      - Patch #2:
          - Expand the explanation with an explicit example.
      - Patch #4:
          - Kconfig: s/sch_drr/sch_ets/
          - sch_ets: Reorder includes to be in alphabetical order
          - sch_ets: ets_quantum_parse(): Rename the return-pointer argument
            from pquantum to quantum, and use it directly, not going through a
            local temporary.
          - sch_ets: ets_qdisc_quanta_parse(): Convert syntax of function
            argument "quanta" from an array to a pointer.
          - sch_ets: ets_qdisc_priomap_parse(): Likewise with "priomap".
          - sch_ets: ets_qdisc_quanta_parse(), ets_qdisc_priomap_parse(): Invoke
            __nla_validate_nested directly instead of nl80211_validate_nested().
          - sch_ets: ets_qdisc_quanta_parse(): WARN_ON_ONCE on invalid attribute
            instead of returning an extack.
          - sch_ets: ets_qdisc_change(): Make the last band the default one for
            unmentioned priomap priorities.
          - sch_ets: Fix a panic when an offloaded child in a bandwidth-sharing
            band notified its ETS parent.
          - sch_ets: When ungrafting, add the newly-created invisible FIFO to
            the Qdisc hash
      - Patch #5:
          - pkt_cls.h: Note that quantum=0 signifies a strict band.
          - Fix error path handling when ets_offload_dump() fails.
      - Patch #6:
          - __mlxsw_sp_qdisc_ets_replace(): Convert syntax of function arguments
            "quanta" and "priomap" from arrays to pointers.
      - Patch #7:
          - __mlxsw_sp_qdisc_ets_replace(): Convert syntax of function argument
            "weights" from an array to a pointer.
      - Patch #9:
          - mlxsw/sch_ets.sh: Add a comment explaining packet prioritization.
          - Adjust the whole suite to allow testing of traffic classifiers
            in addition to testing priomap.
      - Patch #10:
          - Add a number of new tests to test default priomap band, overlarge
            number of bands, zeroes in quanta, and altogether missing quanta.
      - Patch #1 (iproute2):
          - State motivation for inclusion of this patch in the patcheset in the
            commit message.
      - Patch #3 (iproute2):
          - tc-ets.8: it is now December
          - tc-ets.8: explain inactivity WRT using non-WC Qdiscs under ETS band
          - tc-ets.8: s/flow/band in explanation of quantum
          - tc-ets.8: explain what happens with priorities not covered by priomap
          - tc-ets.8: default priomap band is now the last one
          - q_ets.c: ets_parse_opt(): Remove unnecessary initialization of
            priomap and quanta.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6bff0017
    • P
      selftests: qdiscs: Add test coverage for ETS Qdisc · 82c664b6
      Petr Machata 提交于
      Add TDC coverage for the new ETS Qdisc.
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      82c664b6
    • P
      selftests: forwarding: sch_ets: Add test coverage for ETS Qdisc · ddd3fd75
      Petr Machata 提交于
      This tests the newly-added ETS Qdisc. It runs two to three streams of
      traffic, each with a different priority. ETS Qdisc is supposed to allocate
      bandwidth according to the DRR algorithm and given weights. After running
      the traffic for a while, counters are compared for each stream to check
      that the expected ratio is in fact observed.
      
      In order for the DRR process to kick in, a traffic bottleneck must exist in
      the first place. In slow path, such bottleneck can be implemented by
      wrapping the ETS Qdisc inside a TBF or other shaper. This might however
      make the configuration unoffloadable. Instead, on HW datapath, the
      bottleneck would be set up by lowering port speed and configuring shared
      buffer suitably.
      
      Therefore the test is structured as a core component that implements the
      testing, with two wrapper scripts that implement the details of slow path
      resp. fast path configuration.
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ddd3fd75
    • P
      selftests: forwarding: Move start_/stop_traffic from mlxsw to lib.sh · 4cf9b8f9
      Petr Machata 提交于
      These two functions are used for starting several streams of traffic, and
      then stopping them later. They will be handy for the test coverage of ETS
      Qdisc. Move them from mlxsw-specific qos_lib.sh to the generic lib.sh.
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4cf9b8f9
    • P
      mlxsw: spectrum_qdisc: Support offloading of ETS Qdisc · 19f405b9
      Petr Machata 提交于
      Handle TC_SETUP_QDISC_ETS, add a new ops structure for the ETS Qdisc.
      Invoke the extended prio handlers implemented in the previous patch. For
      stats ops, invoke directly the prio callbacks, which are not sensitive to
      differences between PRIO and ETS.
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      19f405b9
    • P
      mlxsw: spectrum_qdisc: Generalize PRIO offload to support ETS · 7917f52a
      Petr Machata 提交于
      Thanks to the similarity between PRIO and ETS it is possible to simply
      reuse most of the code for offloading PRIO Qdisc. Extract the common
      functionality into separate functions, making the current PRIO handlers
      thin API adapters.
      
      Extend the new functions to pass quanta for individual bands, which allows
      configuring a subset of bands as WRR. Invoke mlxsw_sp_port_ets_set() as
      appropriate to de/configure WRR-ness and weight of individual bands.
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7917f52a
    • P
      net: sch_ets: Make the ETS qdisc offloadable · d35eb52b
      Petr Machata 提交于
      Add hooks at appropriate points to make it possible to offload the ETS
      Qdisc.
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d35eb52b
    • P
      net: sch_ets: Add a new Qdisc · dcc68b4d
      Petr Machata 提交于
      Introduces a new Qdisc, which is based on 802.1Q-2014 wording. It is
      PRIO-like in how it is configured, meaning one needs to specify how many
      bands there are, how many are strict and how many are dwrr, quanta for the
      latter, and priomap.
      
      The new Qdisc operates like the PRIO / DRR combo would when configured as
      per the standard. The strict classes, if any, are tried for traffic first.
      When there's no traffic in any of the strict queues, the ETS ones (if any)
      are treated in the same way as in DRR.
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dcc68b4d
    • P
      mlxsw: spectrum: Rename MLXSW_REG_QEEC_HIERARCY_* enumerators · 9cf9b925
      Petr Machata 提交于
      These enums want to be named MLXSW_REG_QEEC_HIERARCHY_, but due to a typo
      lack the second H. That is confusing and complicates searching.
      
      But actually the enumerators should be named _HR_, because that is how
      their enum type is called. So rename them as appropriate.
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9cf9b925
    • P
      mlxsw: spectrum_qdisc: Clarify a comment · 5bc146c9
      Petr Machata 提交于
      Expand the comment at mlxsw_sp_qdisc_prio_graft() to make the problem that
      this function is trying to handle clearer.
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5bc146c9
    • P
      net: pkt_cls: Clarify a comment · 9586a992
      Petr Machata 提交于
      The bit about negating HW backlog left me scratching my head. Clarify the
      comment.
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9586a992
    • K
      sch_cake: drop unused variable tin_quantum_prio · cbd22f17
      Kevin 'ldir' Darbyshire-Bryant 提交于
      Turns out tin_quantum_prio isn't used anymore and is a leftover from a
      previous implementation of diffserv tins.  Since the variable isn't used
      in any calculations it can be eliminated.
      
      Drop variable and places where it was set.  Rename remaining variable
      and consolidate naming of intermediate variables that set it.
      Signed-off-by: NKevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
      Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cbd22f17
    • D
      Merge branch 's390-next' · dcbe4e95
      David S. Miller 提交于
      Julian Wiedmann says:
      
      ====================
      s390/qeth: features 2019-12-18
      
      please apply the following patch series to your net-next tree.
      Nothing major, just the usual mix of small improvements and cleanups.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dcbe4e95
    • J
      s390/qeth: make use of napi_schedule_irqoff() · 334b49de
      Julian Wiedmann 提交于
      qeth_qdio_start_poll() is called from the qdio layer's IRQ handler,
      while IRQs are masked.
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      334b49de
    • J
      s390/qeth: consolidate helpers for capability checking · 52f82bf1
      Julian Wiedmann 提交于
      Convert the old code to use struct qeth_ipa_caps, and while at it remove
      all unused helper macros.
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52f82bf1
    • J
      s390/qeth: stop yielding the ip_lock during IPv4 registration · adee2592
      Julian Wiedmann 提交于
      As commit df2a2a52 ("s390/qeth: convert IP table spinlock to mutex")
      converted the ip_lock to a mutex, we no longer have to yield it while
      the subsequent IO sleep-waits for completion.
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      adee2592
    • J
      s390/qeth: don't raise NETDEV_REBOOT event from L3 offline path · b6beb62a
      Julian Wiedmann 提交于
      This is a leftover from back when a recovery action didn't go through
      dev_close(), and was meant to shoot down all remaining af_iucv sockets
      on the interface.
      
      Now that the offline path always calls dev_close(), the
      NETDEV_GOING_DOWN event from __dev_close_many() is sufficient and this
      hack can be removed.
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6beb62a