1. 01 8月, 2017 20 次提交
  2. 31 7月, 2017 12 次提交
    • D
      Merge branch 'net-sched-actions-improve-dump-performance' · 764646b0
      David S. Miller 提交于
      Jamal Hadi Salim says:
      
      ====================
      net sched actions: improve dump performance
      
      Changes since v11:
      ------------------
      1) Jiri - renames: nla_value to value and nla_selector to selector
      2) Jiri - rename: validate_nla_bitfield_32 to validate_nla_bitfield_32
      3) Jiri - rename: NLA_BITFIELD_32 to NLA_BITFIELD32
      4) Jiri - remove unnecessary break when we return in case statement
      5) Jiri - rename and move nla_get_bitfield_32 to an earlier patch
      6) Jiri - xmas tree alignment of var declaration
      7) Jiri - rename all declarations of bitfield 32 vars to be consistent ("bf")
      8) Jiri - improve validate_nla_bitfield32() validation to disallow valid
                bit values that are not selected by the selector
      
      Changes since v10:
      -----------------
      1) Jiri: move type->validate_content() to its own patch
      Jamal: decided to remove it altogether so we can get this patch set in.
      
      2) Change name of NLA_FLAG_BITS to NLA_BITFIELD_32 based on discussions
      with D. Ahern and Jiri. D. Ahern suggests to make this a variable bitmap size.
      My analysis at this point is it too complex and i only need a few bit
      flags. If we run out of bits someone else can create a new NLA_BITFIELD_XXX
      and start using that. So please let this go.
      
      3) Jamal - Add Suggested-by: Jiri for type NLA_BITFIELD_32
      
      4) Jiri: Change name allowed_flags to tcaa_root_flags_allowed
      
      5) Jiri: Introduce nla_get_flag_bits_values() helper instead of using
      memcpy for retrieving nla_bitfield_32 fields.
      
      Changes since v9:
      -----------------
      
      1) General consensus:
      - remove again the use of BIT() to maintain uapi consistency ;->
      
      1) Jiri:
      - Add a new netlink type NLA_FLAG_BITS to check for valid bits
        and use it instead of inline vetting (patch 4/4 now)
      
      Changes since v8:
      -----------------
      
      1) Jiri:
      - Add back the use of BIT(). Eventually fix iproute2 instead
      - Rename VALID_TCA_FLAGS to VALID_TCA_ROOT_FLAGS
      
      Changes since v7:
      -----------------
      
      Jamal:
      No changes.
      Patch 1 went out twice. Resend without two copies of patch 1
      
      changes since v6:
      -----------------
      
      1) DaveM:
      New rules for netlink messages. From now on we are going to start
      checking for bits that are not used and rejecting anything we dont
      understand. In the future this is going to require major changes
      to user space code (tc etc). This is just a start.
      
      To quote, David:
      "
       Again, bits you aren't using now, make sure userspace doesn't
         set them.  And if it does, reject.
      "
      Added checks for ensuring things work as above.
      
      2) Jiri:
      a)Fix the commit message to properly use "Fixes" description
      b)Align assignments for nla_policy
      
      Changes since v5:
      ----------------
      
      0)
      Remove use of BIT() because it is kernel specific. Requires a separate
      patch (Jiri can submit that in his cleanups)
      
      1)To paraphrase Eric D.
      
      "memcpy(nla_data(count_attr), &cb->args[1], sizeof(u32));
      wont work on 64bit BE machines because cb->args[1]
      (which is 64 bit is larger in size than sizeof(u32))"
      
      Fixed
      
      2) Jiri Pirko
      
      i) Spotted a bug fix mixed in the patch for wrong TLV
      fix. Add patch 1/3 to address this. Make part of this
      series because of dependencies.
      
      ii) Rename ACT_LARGE_DUMP_ON -> TCA_FLAG_LARGE_DUMP_ON
      
      iii) Satisfy Jiri's obsession against the noun "tcaa"
      a)Rename struct nlattr *tcaa --> struct nlattr *tb
      b)Rename TCAA_ACT_XXX -> TCA_ROOT_XXX
      
      Changes since v4:
      -----------------
      
      1) Eric D.
      
      pointed out that when all skb space is used up by the dump
      there will be no space to insert the TCAA_ACT_COUNT attribute.
      
      2) Jiri:
      
      i) Change:
      
      enum {
              TCAA_UNSPEC,
              TCAA_ACT_TAB,
              TCAA_ACT_FLAGS,
              TCAA_ACT_COUNT,
              TCAA_ACT_TIME_FILTER,
              __TCAA_MAX
      };
      
      to:
      enum {
             TCAA_UNSPEC,
             TCAA_ACT_TAB,
             TCAA_ACT_FLAGS,
             TCAA_ACT_COUNT,
             __TCAA_MAX,
      };
      
      Jiri plans to followup with the rest of the code to make the
      style consistent.
      
      ii) Rename attribute TCAA_ACT_TIME_FILTER --> TCAA_ACT_TIME_DELTA
      
      iii) Rename variable jiffy_filter --> jiffy_since
      iv) Rename msecs_filter --> msecs_since
      v) get rid of unused cb->args[0] and rename cb->args[4] to cb->args[0]
      
      Earlier Changes
      ----------------
      - Jiri mostly on names of things.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      764646b0
    • J
      net sched actions: add time filter for action dumping · e62e484d
      Jamal Hadi Salim 提交于
      This patch adds support for filtering based on time since last used.
      When we are dumping a large number of actions it is useful to
      have the option of filtering based on when the action was last
      used to reduce the amount of data crossing to user space.
      
      With this patch the user space app sets the TCA_ROOT_TIME_DELTA
      attribute with the value in milliseconds with "time of interest
      since now".  The kernel converts this to jiffies and does the
      filtering comparison matching entries that have seen activity
      since then and returns them to user space.
      Old kernels and old tc continue to work in legacy mode since
      they dont specify this attribute.
      
      Some example (we have 400 actions bound to 400 filters); at
      installation time. Using updated when tc setting the time of
      interest to 120 seconds earlier (we see 400 actions):
      prompt$ hackedtc actions ls action gact since 120000| grep index | wc -l
      400
      
      go get some coffee and wait for > 120 seconds and try again:
      
      prompt$ hackedtc actions ls action gact since 120000 | grep index | wc -l
      0
      
      Lets see a filter bound to one of these actions:
      ....
      filter pref 10 u32
      filter pref 10 u32 fh 800: ht divisor 1
      filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10  (rule hit 2 success 1)
        match 7f000002/ffffffff at 12 (success 1 )
          action order 1: gact action pass
           random type none pass val 0
           index 23 ref 2 bind 1 installed 1145 sec used 802 sec
          Action statistics:
          Sent 84 bytes 1 pkt (dropped 0, overlimits 0 requeues 0)
          backlog 0b 0p requeues 0
      ....
      
      that coffee took long, no? It was good.
      
      Now lets ping -c 1 127.0.0.2, then run the actions again:
      prompt$ hackedtc actions ls action gact since 120 | grep index | wc -l
      1
      
      More details please:
      prompt$ hackedtc -s actions ls action gact since 120000
      
          action order 0: gact action pass
           random type none pass val 0
           index 23 ref 2 bind 1 installed 1270 sec used 30 sec
          Action statistics:
          Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
          backlog 0b 0p requeues 0
      
      And the filter?
      
      filter pref 10 u32
      filter pref 10 u32 fh 800: ht divisor 1
      filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10  (rule hit 4 success 2)
        match 7f000002/ffffffff at 12 (success 2 )
          action order 1: gact action pass
           random type none pass val 0
           index 23 ref 2 bind 1 installed 1324 sec used 84 sec
          Action statistics:
          Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
          backlog 0b 0p requeues 0
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e62e484d
    • J
      net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch · 90825b23
      Jamal Hadi Salim 提交于
      When you dump hundreds of thousands of actions, getting only 32 per
      dump batch even when the socket buffer and memory allocations allow
      is inefficient.
      
      With this change, the user will get as many as possibly fitting
      within the given constraints available to the kernel.
      
      The top level action TLV space is extended. An attribute
      TCA_ROOT_FLAGS is used to carry flags; flag TCA_FLAG_LARGE_DUMP_ON
      is set by the user indicating the user is capable of processing
      these large dumps. Older user space which doesnt set this flag
      doesnt get the large (than 32) batches.
      The kernel uses the TCA_ROOT_COUNT attribute to tell the user how many
      actions are put in a single batch. As such user space app knows how long
      to iterate (independent of the type of action being dumped)
      instead of hardcoded maximum of 32 thus maintaining backward compat.
      
      Some results dumping 1.5M actions below:
      first an unpatched tc which doesnt understand these features...
      
      prompt$ time -p tc actions ls action gact | grep index | wc -l
      1500000
      real 1388.43
      user 2.07
      sys 1386.79
      
      Now lets see a patched tc which sets the correct flags when requesting
      a dump:
      
      prompt$ time -p updatedtc actions ls action gact | grep index | wc -l
      1500000
      real 178.13
      user 2.02
      sys 176.96
      
      That is about 8x performance improvement for tc app which sets its
      receive buffer to about 32K.
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90825b23
    • J
      net sched actions: Use proper root attribute table for actions · df823b02
      Jamal Hadi Salim 提交于
      Bug fix for an issue which has been around for about a decade.
      We got away with it because the enumeration was larger than needed.
      
      Fixes: 7ba699c6 ("[NET_SCHED]: Convert actions from rtnetlink to new netlink API")
      Suggested-by: NJiri Pirko <jiri@mellanox.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df823b02
    • J
      net netlink: Add new type NLA_BITFIELD32 · 64c83d83
      Jamal Hadi Salim 提交于
      Generic bitflags attribute content sent to the kernel by user.
      With this netlink attr type the user can either set or unset a
      flag in the kernel.
      
      The value is a bitmap that defines the bit values being set
      The selector is a bitmask that defines which value bit is to be
      considered.
      
      A check is made to ensure the rules that a kernel subsystem always
      conforms to bitflags the kernel already knows about. i.e
      if the user tries to set a bit flag that is not understood then
      the _it will be rejected_.
      
      In the most basic form, the user specifies the attribute policy as:
      [ATTR_GOO] = { .type = NLA_BITFIELD32, .validation_data = &myvalidflags },
      
      where myvalidflags is the bit mask of the flags the kernel understands.
      
      If the user _does not_ provide myvalidflags then the attribute will
      also be rejected.
      
      Examples:
      value = 0x0, and selector = 0x1
      implies we are selecting bit 1 and we want to set its value to 0.
      
      value = 0x2, and selector = 0x2
      implies we are selecting bit 2 and we want to set its value to 1.
      Suggested-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      64c83d83
    • A
      net: fec: Allow reception of frames bigger than 1522 bytes · fbbeefdd
      Andrew Lunn 提交于
      The FEC Receive Control Register has a 14 bit field indicating the
      longest frame that may be received. It is being set to 1522. Frames
      longer than this are discarded, but counted as being in error.
      
      When using DSA, frames from the switch has an additional header,
      either 4 or 8 bytes if a Marvell switch is used. Thus a full MTU frame
      of 1522 bytes received by the switch on a port becomes 1530 bytes when
      passed to the host via the FEC interface.
      
      Change the maximum receive size to 2048 - 64, where 64 is the maximum
      rx_alignment applied on the receive buffer for AVB capable FEC
      cores. Use this value also for the maximum receive buffer size. The
      driver is already allocating a receive SKB of 2048 bytes, so this
      change should not have any significant effects.
      
      Tested on imx51, imx6, vf610.
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fbbeefdd
    • A
      net: fec: Issue error for missing but expected PHY · 9558df3a
      Andrew Lunn 提交于
      If the PHY is missing but expected, e.g. because of a typ0 in the dt
      file, it is not possible to open the interface. ip link returns:
      
      RTNETLINK answers: No such device
      
      It is not very obvious what the problem is. Add a netdev_err() in this
      case to make it easier to debug the issue.
      
      [   21.409385] fec 2188000.ethernet eth0: Unable to connect to phy
      RTNETLINK answers: No such device
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Acked-by: NFugang Duan <fugang.duan@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9558df3a
    • D
      Merge branch 'dsa-lan9303-Fix-MDIO-issues' · 509394e8
      David S. Miller 提交于
      Egil Hjelmeland says:
      
      ====================
      net: dsa: lan9303: Fix MDIO issues.
      
      This series fix the MDIO interface for the lan9303 DSA driver.
      Bugs found after testing on actual HW.
      
      This series is extracted from the first patch of my first large
      series. Significant changes from that version are:
       - use mdiobus_write_nested, mdiobus_read_nested.
       - EXPORT lan9303_indirect_phy_ops
      
      Unfortunately I do not have access to i2c based system for
      testing.
      
      Changes from first version:
       - Change EXPORT_SYMBOL to EXPORT_SYMBOL_GPL
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      509394e8
    • E
      net: dsa: lan9303: MDIO access phy registers directly · 2c340898
      Egil Hjelmeland 提交于
      Indirect access (PMI) to phy register only work in I2C mode. In
      MDIO mode phy registers must be accessed directly. Introduced
      struct lan9303_phy_ops to handle the two modes.
      Signed-off-by: NEgil Hjelmeland <privat@egil-hjelmeland.no>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c340898
    • E
      net: dsa: lan9303: Renamed indirect phy access functions · 9e866e5d
      Egil Hjelmeland 提交于
      Preparing for the following fix of MDIO phy access:
      
      Renamed functions that access PHY 1 and 2 indirectly through PMI
      registers.
      
       lan9303_port_phy_reg_wait_for_completion() to
       lan9303_indirect_phy_wait_for_completion()
      
       lan9303_port_phy_reg_read() to
       lan9303_indirect_phy_read()
      
       lan9303_port_phy_reg_write() to
       lan9303_indirect_phy_write()
      
      Also changed "val" parameter of lan9303_indirect_phy_write() to u16,
      for clarity.
      Signed-off-by: NEgil Hjelmeland <privat@egil-hjelmeland.no>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9e866e5d
    • E
      net: dsa: lan9303: Multiply by 4 to get MDIO register · ab78acb1
      Egil Hjelmeland 提交于
      lan9303_mdio_write()/_read() must multiply register number by 4 to get
      offset.
      
      Added some commments to the register definitions.
      Signed-off-by: NEgil Hjelmeland <privat@egil-hjelmeland.no>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab78acb1
    • E
      net: dsa: lan9303: Fix lan9303_detect_phy_setup() for MDIO · d329ac88
      Egil Hjelmeland 提交于
      Handle that MDIO read with no response return 0xffff.
      Signed-off-by: NEgil Hjelmeland <privat@egil-hjelmeland.no>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d329ac88
  3. 30 7月, 2017 8 次提交
    • D
      Merge branch 'ethtool-fec' · 736b9b9c
      David S. Miller 提交于
      Roopa Prabhu says:
      
      ====================
      ethtool: support for forward error correction mode setting on a link
      
      Forward Error Correction (FEC) modes i.e Base-R
      and Reed-Solomon modes are introduced in 25G/40G/100G standards
      for providing good BER at high speeds. Various networking devices
      which support 25G/40G/100G provides ability to manage supported FEC
      modes and the lack of FEC encoding control and reporting today is a
      source for interoperability issues for many vendors.
      FEC capability as well as specific FEC mode i.e. Base-R
      or RS modes can be requested or advertised through bits D44:47 of base link
      codeword.
      
      This patch set intends to provide option under ethtool to manage and
      report FEC encoding settings for networking devices as per IEEE 802.3
      bj, bm and by specs.
      
      v2 :
              - minor patch format fixes and typos pointed out by Andrew
              - there was a pending discussion on the use of 'auto' vs
                'automatic' for fec settings. I have left it as 'auto'
                because in most cases today auto is used in place of
                automatic to represent automatically generated values.
                We use it in other networking config too. I would prefer
                leaving it as auto.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      736b9b9c
    • C
    • C
    • V
      net: ethtool: add support for forward error correction modes · 1a5f3da2
      Vidya Sagar Ravipati 提交于
      Forward Error Correction (FEC) modes i.e Base-R
      and Reed-Solomon modes are introduced in 25G/40G/100G standards
      for providing good BER at high speeds. Various networking devices
      which support 25G/40G/100G provides ability to manage supported FEC
      modes and the lack of FEC encoding control and reporting today is a
      source for interoperability issues for many vendors.
      FEC capability as well as specific FEC mode i.e. Base-R
      or RS modes can be requested or advertised through bits D44:47 of
      base link codeword.
      
      This patch set intends to provide option under ethtool to manage
      and report FEC encoding settings for networking devices as per
      IEEE 802.3 bj, bm and by specs.
      
      set-fec/show-fec option(s) are designed to provide control and
      report the FEC encoding on the link.
      
      SET FEC option:
      root@tor: ethtool --set-fec  swp1 encoding [off | RS | BaseR | auto]
      
      Encoding: Types of encoding
      Off    :  Turning off any encoding
      RS     :  enforcing RS-FEC encoding on supported speeds
      BaseR  :  enforcing Base R encoding on supported speeds
      Auto   :  IEEE defaults for the speed/medium combination
      
      Here are a few examples of what we would expect if encoding=auto:
      - if autoneg is on, we are  expecting FEC to be negotiated as on or off
        as long as protocol supports it
      - if the hardware is capable of detecting the FEC encoding on it's
            receiver it will reconfigure its encoder to match
      - in absence of the above, the configuration would be set to IEEE
        defaults.
      
      >From our  understanding , this is essentially what most hardware/driver
      combinations are doing today in the absence of a way for users to
      control the behavior.
      
      SHOW FEC option:
      root@tor: ethtool --show-fec  swp1
      FEC parameters for swp1:
      Active FEC encodings: RS
      Configured FEC encodings:  RS | BaseR
      
      ETHTOOL DEVNAME output modification:
      
      ethtool devname output:
      root@tor:~# ethtool swp1
      Settings for swp1:
      root@hpe-7712-03:~# ethtool swp18
      Settings for swp18:
          Supported ports: [ FIBRE ]
          Supported link modes:   40000baseCR4/Full
                                  40000baseSR4/Full
                                  40000baseLR4/Full
                                  100000baseSR4/Full
                                  100000baseCR4/Full
                                  100000baseLR4_ER4/Full
          Supported pause frame use: No
          Supports auto-negotiation: Yes
          Supported FEC modes: [RS | BaseR | None | Not reported]
          Advertised link modes:  Not reported
          Advertised pause frame use: No
          Advertised auto-negotiation: No
          Advertised FEC modes: [RS | BaseR | None | Not reported]
      <<<< One or more FEC modes
          Speed: 100000Mb/s
          Duplex: Full
          Port: FIBRE
          PHYAD: 106
          Transceiver: internal
          Auto-negotiation: off
          Link detected: yes
      
      This patch includes following changes
      a) New ETHTOOL_SFECPARAM/SFECPARAM API, handled by
        the new get_fecparam/set_fecparam callbacks, provides support
        for configuration of forward error correction modes.
      b) Link mode bits for FEC modes i.e. None (No FEC mode), RS, BaseR/FC
        are defined so that users can configure these fec modes for supported
        and advertising fields as part of link autonegotiation.
      Signed-off-by: NVidya Sagar Ravipati <vidya.chowdary@gmail.com>
      Signed-off-by: NDustin Byford <dustin@cumulusnetworks.com>
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a5f3da2
    • D
      Merge branch 'netvsc-minor-fixes-and-optimization' · fe21b269
      David S. Miller 提交于
      Stephen Hemminger says:
      
      ====================
      netvsc: minor fixes and optimization
      
      This is a subset of earlier submission with a few more fixes
      found during testing. The are two small optimizations, one is to
      better manage the receive completion ring, and the other is removing
      one unneeded level of indirection.
      
      Will submit the improved VF support and buffer sizing in a later
      patch so they get more review.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe21b269
    • S
      netvsc: signal host if receive ring is emptied · f4e40363
      stephen hemminger 提交于
      Latency improvement related to NAPI conversion.
      If all packets are processed from receive ring then need
      to signal host.
      Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4e40363
    • S
      netvsc: fix error unwind on device setup failure · 49393347
      stephen hemminger 提交于
      If setting receive buffer fails, the error unwind would cause
      kernel panic because it was not correctly doing RCU and NAPI
      unwind.  RCU'd pointer needs to be reset to NULL, and NAPI needs
      to be disabled not deleted.
      Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49393347
    • S
      netvsc: optimize receive completions · 7426b1a5
      stephen hemminger 提交于
      Optimize how receive completion ring are managed.
         * Allocate only as many slots as needed for all buffers from host
         * Allocate before setting up sub channel for better error detection
         * Don't need to keep copy of initial receive section message
         * Precompute the watermark for when receive flushing is needed
         * Replace division with conditional test
         * Replace atomic per-device variable with per-channel check.
         * Handle corner case where receive completion send
           fails if ring buffer to host is full.
      Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7426b1a5