1. 04 8月, 2017 2 次提交
  2. 03 8月, 2017 2 次提交
  3. 02 8月, 2017 6 次提交
  4. 01 8月, 2017 8 次提交
  5. 31 7月, 2017 3 次提交
    • J
      net sched actions: add time filter for action dumping · e62e484d
      Jamal Hadi Salim 提交于
      This patch adds support for filtering based on time since last used.
      When we are dumping a large number of actions it is useful to
      have the option of filtering based on when the action was last
      used to reduce the amount of data crossing to user space.
      
      With this patch the user space app sets the TCA_ROOT_TIME_DELTA
      attribute with the value in milliseconds with "time of interest
      since now".  The kernel converts this to jiffies and does the
      filtering comparison matching entries that have seen activity
      since then and returns them to user space.
      Old kernels and old tc continue to work in legacy mode since
      they dont specify this attribute.
      
      Some example (we have 400 actions bound to 400 filters); at
      installation time. Using updated when tc setting the time of
      interest to 120 seconds earlier (we see 400 actions):
      prompt$ hackedtc actions ls action gact since 120000| grep index | wc -l
      400
      
      go get some coffee and wait for > 120 seconds and try again:
      
      prompt$ hackedtc actions ls action gact since 120000 | grep index | wc -l
      0
      
      Lets see a filter bound to one of these actions:
      ....
      filter pref 10 u32
      filter pref 10 u32 fh 800: ht divisor 1
      filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10  (rule hit 2 success 1)
        match 7f000002/ffffffff at 12 (success 1 )
          action order 1: gact action pass
           random type none pass val 0
           index 23 ref 2 bind 1 installed 1145 sec used 802 sec
          Action statistics:
          Sent 84 bytes 1 pkt (dropped 0, overlimits 0 requeues 0)
          backlog 0b 0p requeues 0
      ....
      
      that coffee took long, no? It was good.
      
      Now lets ping -c 1 127.0.0.2, then run the actions again:
      prompt$ hackedtc actions ls action gact since 120 | grep index | wc -l
      1
      
      More details please:
      prompt$ hackedtc -s actions ls action gact since 120000
      
          action order 0: gact action pass
           random type none pass val 0
           index 23 ref 2 bind 1 installed 1270 sec used 30 sec
          Action statistics:
          Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
          backlog 0b 0p requeues 0
      
      And the filter?
      
      filter pref 10 u32
      filter pref 10 u32 fh 800: ht divisor 1
      filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10  (rule hit 4 success 2)
        match 7f000002/ffffffff at 12 (success 2 )
          action order 1: gact action pass
           random type none pass val 0
           index 23 ref 2 bind 1 installed 1324 sec used 84 sec
          Action statistics:
          Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
          backlog 0b 0p requeues 0
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e62e484d
    • J
      net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch · 90825b23
      Jamal Hadi Salim 提交于
      When you dump hundreds of thousands of actions, getting only 32 per
      dump batch even when the socket buffer and memory allocations allow
      is inefficient.
      
      With this change, the user will get as many as possibly fitting
      within the given constraints available to the kernel.
      
      The top level action TLV space is extended. An attribute
      TCA_ROOT_FLAGS is used to carry flags; flag TCA_FLAG_LARGE_DUMP_ON
      is set by the user indicating the user is capable of processing
      these large dumps. Older user space which doesnt set this flag
      doesnt get the large (than 32) batches.
      The kernel uses the TCA_ROOT_COUNT attribute to tell the user how many
      actions are put in a single batch. As such user space app knows how long
      to iterate (independent of the type of action being dumped)
      instead of hardcoded maximum of 32 thus maintaining backward compat.
      
      Some results dumping 1.5M actions below:
      first an unpatched tc which doesnt understand these features...
      
      prompt$ time -p tc actions ls action gact | grep index | wc -l
      1500000
      real 1388.43
      user 2.07
      sys 1386.79
      
      Now lets see a patched tc which sets the correct flags when requesting
      a dump:
      
      prompt$ time -p updatedtc actions ls action gact | grep index | wc -l
      1500000
      real 178.13
      user 2.02
      sys 176.96
      
      That is about 8x performance improvement for tc app which sets its
      receive buffer to about 32K.
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90825b23
    • J
      net netlink: Add new type NLA_BITFIELD32 · 64c83d83
      Jamal Hadi Salim 提交于
      Generic bitflags attribute content sent to the kernel by user.
      With this netlink attr type the user can either set or unset a
      flag in the kernel.
      
      The value is a bitmap that defines the bit values being set
      The selector is a bitmask that defines which value bit is to be
      considered.
      
      A check is made to ensure the rules that a kernel subsystem always
      conforms to bitflags the kernel already knows about. i.e
      if the user tries to set a bit flag that is not understood then
      the _it will be rejected_.
      
      In the most basic form, the user specifies the attribute policy as:
      [ATTR_GOO] = { .type = NLA_BITFIELD32, .validation_data = &myvalidflags },
      
      where myvalidflags is the bit mask of the flags the kernel understands.
      
      If the user _does not_ provide myvalidflags then the attribute will
      also be rejected.
      
      Examples:
      value = 0x0, and selector = 0x1
      implies we are selecting bit 1 and we want to set its value to 0.
      
      value = 0x2, and selector = 0x2
      implies we are selecting bit 2 and we want to set its value to 1.
      Suggested-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      64c83d83
  6. 30 7月, 2017 2 次提交
    • V
      net: ethtool: add support for forward error correction modes · 1a5f3da2
      Vidya Sagar Ravipati 提交于
      Forward Error Correction (FEC) modes i.e Base-R
      and Reed-Solomon modes are introduced in 25G/40G/100G standards
      for providing good BER at high speeds. Various networking devices
      which support 25G/40G/100G provides ability to manage supported FEC
      modes and the lack of FEC encoding control and reporting today is a
      source for interoperability issues for many vendors.
      FEC capability as well as specific FEC mode i.e. Base-R
      or RS modes can be requested or advertised through bits D44:47 of
      base link codeword.
      
      This patch set intends to provide option under ethtool to manage
      and report FEC encoding settings for networking devices as per
      IEEE 802.3 bj, bm and by specs.
      
      set-fec/show-fec option(s) are designed to provide control and
      report the FEC encoding on the link.
      
      SET FEC option:
      root@tor: ethtool --set-fec  swp1 encoding [off | RS | BaseR | auto]
      
      Encoding: Types of encoding
      Off    :  Turning off any encoding
      RS     :  enforcing RS-FEC encoding on supported speeds
      BaseR  :  enforcing Base R encoding on supported speeds
      Auto   :  IEEE defaults for the speed/medium combination
      
      Here are a few examples of what we would expect if encoding=auto:
      - if autoneg is on, we are  expecting FEC to be negotiated as on or off
        as long as protocol supports it
      - if the hardware is capable of detecting the FEC encoding on it's
            receiver it will reconfigure its encoder to match
      - in absence of the above, the configuration would be set to IEEE
        defaults.
      
      >From our  understanding , this is essentially what most hardware/driver
      combinations are doing today in the absence of a way for users to
      control the behavior.
      
      SHOW FEC option:
      root@tor: ethtool --show-fec  swp1
      FEC parameters for swp1:
      Active FEC encodings: RS
      Configured FEC encodings:  RS | BaseR
      
      ETHTOOL DEVNAME output modification:
      
      ethtool devname output:
      root@tor:~# ethtool swp1
      Settings for swp1:
      root@hpe-7712-03:~# ethtool swp18
      Settings for swp18:
          Supported ports: [ FIBRE ]
          Supported link modes:   40000baseCR4/Full
                                  40000baseSR4/Full
                                  40000baseLR4/Full
                                  100000baseSR4/Full
                                  100000baseCR4/Full
                                  100000baseLR4_ER4/Full
          Supported pause frame use: No
          Supports auto-negotiation: Yes
          Supported FEC modes: [RS | BaseR | None | Not reported]
          Advertised link modes:  Not reported
          Advertised pause frame use: No
          Advertised auto-negotiation: No
          Advertised FEC modes: [RS | BaseR | None | Not reported]
      <<<< One or more FEC modes
          Speed: 100000Mb/s
          Duplex: Full
          Port: FIBRE
          PHYAD: 106
          Transceiver: internal
          Auto-negotiation: off
          Link detected: yes
      
      This patch includes following changes
      a) New ETHTOOL_SFECPARAM/SFECPARAM API, handled by
        the new get_fecparam/set_fecparam callbacks, provides support
        for configuration of forward error correction modes.
      b) Link mode bits for FEC modes i.e. None (No FEC mode), RS, BaseR/FC
        are defined so that users can configure these fec modes for supported
        and advertising fields as part of link autonegotiation.
      Signed-off-by: NVidya Sagar Ravipati <vidya.chowdary@gmail.com>
      Signed-off-by: NDustin Byford <dustin@cumulusnetworks.com>
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a5f3da2
    • P
      udp6: fix socket leak on early demux · c9f2c1ae
      Paolo Abeni 提交于
      When an early demuxed packet reaches __udp6_lib_lookup_skb(), the
      sk reference is retrieved and used, but the relevant reference
      count is leaked and the socket destructor is never called.
      Beyond leaking the sk memory, if there are pending UDP packets
      in the receive queue, even the related accounted memory is leaked.
      
      In the long run, this will cause persistent forward allocation errors
      and no UDP skbs (both ipv4 and ipv6) will be able to reach the
      user-space.
      
      Fix this by explicitly accessing the early demux reference before
      the lookup, and properly decreasing the socket reference count
      after usage.
      
      Also drop the skb_steal_sock() in __udp6_lib_lookup_skb(), and
      the now obsoleted comment about "socket cache".
      
      The newly added code is derived from the current ipv4 code for the
      similar path.
      
      v1 -> v2:
        fixed the __udp6_lib_rcv() return code for resubmission,
        as suggested by Eric
      Reported-by: NSam Edwards <CFSworks@gmail.com>
      Reported-by: NMarc Haber <mh+netdev@zugschlus.de>
      Fixes: 5425077d ("net: ipv6: Add early demux handler for UDP unicast")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9f2c1ae
  7. 27 7月, 2017 11 次提交
  8. 26 7月, 2017 4 次提交
    • M
      net: phy: Remove trailing semicolon in macro definition · 2eaa38d9
      Marc Gonzalez 提交于
      Commit e5a03bfd ("phy: Add an mdio_device structure")
      introduced a spurious trailing semicolon. Remove it.
      Signed-off-by: NMarc Gonzalez <marc_gonzalez@sigmadesigns.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2eaa38d9
    • T
      workqueue: implicit ordered attribute should be overridable · 0a94efb5
      Tejun Heo 提交于
      5c0338c6 ("workqueue: restore WQ_UNBOUND/max_active==1 to be
      ordered") automatically enabled ordered attribute for unbound
      workqueues w/ max_active == 1.  Because ordered workqueues reject
      max_active and some attribute changes, this implicit ordered mode
      broke cases where the user creates an unbound workqueue w/ max_active
      == 1 and later explicitly changes the related attributes.
      
      This patch distinguishes explicit and implicit ordered setting and
      overrides from attribute changes if implict.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Fixes: 5c0338c6 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered")
      0a94efb5
    • P
      udp: preserve head state for IP_CMSG_PASSSEC · dce4551c
      Paolo Abeni 提交于
      Paul Moore reported a SELinux/IP_PASSSEC regression
      caused by missing skb->sp at recvmsg() time. We need to
      preserve the skb head state to process the IP_CMSG_PASSSEC
      cmsg.
      
      With this commit we avoid releasing the skb head state in the
      BH even if a secpath is attached to the current skb, and stores
      the skb status (with/without head states) in the scratch area,
      so that we can access it at skb deallocation time, without
      incurring in cache-miss penalties.
      
      This also avoids misusing the skb CB for ipv6 packets,
      as introduced by the commit 0ddf3fb2 ("udp: preserve
      skb->dst if required for IP options processing").
      
      Clean a bit the scratch area helpers implementation, to
      reduce the code differences between 32 and 64 bits build.
      Reported-by: NPaul Moore <paul@paul-moore.com>
      Fixes: 0a463c78 ("udp: avoid a cache miss on dequeue")
      Fixes: 0ddf3fb2 ("udp: preserve skb->dst if required for IP options processing")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Tested-by: NPaul Moore <paul@paul-moore.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dce4551c
    • J
      nvme-fc: revise TRADDR parsing · 9c5358e1
      James Smart 提交于
      The FC-NVME spec hasn't locked down on the format string for TRADDR.
      Currently the spec is lobbying for "nn-<16hexdigits>:pn-<16hexdigits>"
      where the wwn's are hex values but not prefixed by 0x.
      
      Most implementations so far expect a string format of
      "nn-0x<16hexdigits>:pn-0x<16hexdigits>" to be used. The transport
      uses the match_u64 parser which requires a leading 0x prefix to set
      the base properly. If it's not there, a match will either fail or return
      a base 10 value.
      
      The resolution in T11 is pushing out. Therefore, to fix things now and
      to cover any eventuality and any implementations already in the field,
      this patch adds support for both formats.
      
      The change consists of replacing the token matching routine with a
      routine that validates the fixed string format, and then builds
      a local copy of the hex name with a 0x prefix before calling
      the system parser.
      
      Note: the same parser routine exists in both the initiator and target
      transports. Given this is about the only "shared" item, we chose to
      replicate rather than create an interdendency on some shared code.
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      9c5358e1
  9. 25 7月, 2017 2 次提交